Movatterモバイル変換


[0]ホーム

URL:


CN120371743A - On-demand code obfuscation of data in input path of object storage service - Google Patents

On-demand code obfuscation of data in input path of object storage service

Info

Publication number
CN120371743A
CN120371743ACN202510446214.XACN202510446214ACN120371743ACN 120371743 ACN120371743 ACN 120371743ACN 202510446214 ACN202510446214 ACN 202510446214ACN 120371743 ACN120371743 ACN 120371743A
Authority
CN
China
Prior art keywords
data
input data
function
service
store
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510446214.XA
Other languages
Chinese (zh)
Inventor
拉米扬舒·达塔
蒂莫西·劳伦斯·哈里斯
凯文·C·米勒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amazon Technologies Inc
Original Assignee
Amazon Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/586,816external-prioritypatent/US11386230B2/en
Priority claimed from US16/586,825external-prioritypatent/US11023311B2/en
Priority claimed from US16/586,818external-prioritypatent/US10996961B2/en
Application filed by Amazon Technologies IncfiledCriticalAmazon Technologies Inc
Publication of CN120371743ApublicationCriticalpatent/CN120371743A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Inputs and outputs (I/O) to the object store service are modified by implementing one or more owner-specified functions for the I/O requests. The function may implement data manipulation, such as filtering sensitive data before reading or writing the data. The function may be applied prior to implementing a request method (e.g., GET or PUT) specified within the I/O request such that the data to which the method is applied may not match the object specified in the request. For example, a user may request to obtain (e.g., GET) a data set. The dataset may be passed to a function that filters sensitive data from the dataset, and the GET request method may then be applied to the output of the function. In this way, the owners of objects on the object store service are provided greater control over the objects stored in or retrieved from the service.

Description

On-demand code obfuscation of data in input path of object storage service
The present application is a divisional application of patent application with application number 2020800734085, application date 2020, 9 and 23, and application name "on-demand code blurring of data in input path of object storage service".
Background
The computing devices may utilize a communication network to exchange data. Companies and organizations operate computer networks that interconnect many computing devices to support operations or provide services to third parties. The computing systems may be located in a single geographic location or in multiple different geographic locations (e.g., interconnected via a private or public communication network). In particular, a data center or data processing center (referred to herein generally as a "data center") may include a number of interconnected computing systems to provide computing resources to users of the data center. The data center may be a dedicated data center operated on behalf of an organization or a public data center operated on behalf of the public or for the benefit of the public.
To facilitate increased utilization of data center resources, virtualization techniques allow a single physical computing device to host one or more instances of virtual machines that are presented and operated as independent computing devices for users of the data center. Through virtualization, a single physical computing device may create, maintain, delete, or otherwise manage virtual machines in a dynamic manner. Further, a user may request computer resources from a data center, including configurations of individual computing devices or networked computing devices, and be provided with different amounts of virtual machine resources.
In addition to computing resources, data centers provide many other beneficial services to client devices. For example, a data center may provide a data storage service configured to store data submitted by client devices and allow retrieval of the data over a network. Various types of data storage services may be provided, typically as a function of their input/output (I/O) mechanisms. For example, a database service may allow I/O based on a database query language such as Structured Query Language (SQL). The block storage service may allow I/O based on modifications to one or more defined length blocks in a manner similar to how an operating system interacts with local storage, and thus may facilitate virtualized disk drives that may be used, for example, to store an operating system of a virtual machine. The object store service may allow I/O at the level of individual objects or resources (such as individual files) that may vary in content and length. For example, the object store service may provide an interface conforming to the representational state transfer (REST) architectural style, such as by allowing for I/O based on a call specifying input data and a hypertext transfer protocol request method (e.g., GET, PUT, POST, DELETE, etc.) to be applied to the data. By transmitting call and request methods specifying input data, the client can therefore retrieve data from the object store service, write data as new objects to the object store service, modify existing objects, and so forth.
Drawings
FIG. 1 is a block diagram depicting an illustrative environment in which an object store service may perform system operations in conjunction with on-demand code to implement functionality related to input/output (I/O) requests to the object store service;
FIG. 2 depicts a general architecture of a front-end computing device providing the object storage service of FIG. 1;
FIG. 3 is a flow chart depicting illustrative interactions for enabling a client device to modify an I/O path of an object storage service by inserting a function implemented via execution of a task on an on-demand code execution system;
FIG. 4 is an illustrative visualization of a pipeline of functions to be applied to the I/O path of the object store service of FIG. 1;
5A-5B show a flow chart depicting illustrative interactions for processing a request to store input data as an object on the object storage service of FIG. 1, including performing an owner-specified task on the input data and storing an output of the task as an object;
6A-6B show a flow chart depicting illustrative interactions for processing a request to retrieve data of an object on the object storage service of FIG. 1, including performing an owner-specified task on the object and transmitting an output of the task as the object to a requesting device;
FIG. 7 is a flow diagram depicting an illustrative routine for implementing an owner-defined function related to an I/O request obtained at the object store service of FIG. 1 via an I/O path, and
FIG. 8 is a flow diagram depicting an illustrative routine for executing tasks on the on-demand code execution system of FIG. 1 to implement data manipulation during implementation of an owner-defined function.
FIG. 9 is a flow diagram depicting an illustrative routine for executing tasks on the on-demand code execution system of FIG. 1 to execute a first function and a second function in response to storing data objects provided in multiple parts.
FIG. 10 is a system diagram of illustrative data flow and interaction between various components of the service provider system associated with the routine shown in FIG. 9.
FIG. 11 is a flow diagram depicting an illustrative routine for executing tasks on the on-demand code execution system of FIG. 1 to dynamically obscure portions of input data in response to storing the input data.
FIG. 12 is a system diagram of illustrative data flow and interaction between various components of the service provider system associated with the routine shown in FIG. 11.
FIG. 13 is a flow diagram depicting an illustrative routine for executing tasks on the on-demand code execution system of FIG. 1 to dynamically determine and store an index of content of input data in response to a request to store the input data.
FIG. 14 is a system diagram of illustrative data flow and interaction between various components of the service provider system associated with the routine shown in FIG. 13.
Detailed Description
In general, aspects of the present disclosure relate to processing requests to read data objects or write data objects on an object storage system. More particularly, aspects of the present disclosure relate to modification of input/output (I/O) paths of object storage services such that one or more data manipulations may be inserted into the I/O paths to modify data to which a called request method is to be applied without requiring a calling client device to specify such data manipulations. In one embodiment, data manipulation is performed by executing user-submitted code, which may be provided, for example, by an owner of a set of data objects on an object storage system, in order to control interactions with the data objects. For example, if an owner of a collection of objects wishes to ensure that an end user does not submit objects to the collection, including any personal identification information (to ensure privacy of the end user), the owner may submit code executable to remove such information from the data input. The owner may further specify that such code should be executed during each write of the data object to the collection. Thus, when an end user attempts to write input data as a data object to a collection (e.g., via the HTTP PUT method), code may be executed first on the input data, and the resulting output data may be written as a data object to the collection. It should be noted that this may result in the operation requested by the end user (such as a write operation) not being applied to the end user's input data, but to the data output by the code that is manipulated (e.g., submitted by the owner) by the data. In this way, owners of data sets control the I/O to these sets, independent of end users adhering to the owners' requirements. In fact, the end user (or any other client device) may not be aware that modifications to I/O are occurring. Thus, embodiments of the present disclosure are able to modify I/O to an object store service without modifying the interface to the service, thereby ensuring mutual compatibility with other pre-existing software that utilizes the service.
In some embodiments of the present disclosure, data manipulation may occur on an on-demand code execution system (sometimes referred to as a serverless execution system). In general, an on-demand code execution system is capable of executing any user-specified code without requiring a user to create, maintain, or configure an execution environment (e.g., a physical machine or virtual machine) in which the code is executed. For example, while conventional computing services typically require a user to provide a particular device (virtual or physical), install an operating system on the device, configure application programs, define a network interface, etc., an on-demand code execution system may enable the user to submit code, and may provide the user with an Application Programming Interface (API) that, when used, enables the user to request execution of the code. Upon receiving a call through an API, the on-demand code execution system may generate an execution environment for the code, provide the code to the environment, execute the code, and provide the results. Thus, an on-demand code execution system may eliminate the need for a user to handle the configuration and management of the environment for code execution. For example, an example technique for implementing an on-demand code execution system is disclosed in U.S. patent No. 9,323,556 ("556 patent") entitled "PROGRAMMATIC EVENT DETECTION AND MESSAGE GENERATION FOR REQUESTS TO EXECUTE PROGRAM CODE," filed on 9/30/2014, the entire contents of which are incorporated herein by reference.
Because of the flexibility of an on-demand code execution system to execute arbitrary code, such a system may be used to create a variety of web services. For example, such a system may be used to create a "micro-service," i.e., a web service that implements a small number of functions (or just one function) and interacts with other services to provide an application. In the context of an on-demand code-executing system, code that is executed to create such a service is often referred to as a "function" or "task," which can be executed to implement the service. Thus, one technique for performing data manipulation within an I/O path of an object storage service may be to create tasks on an on-demand code execution system that, when executed, perform the required data manipulation. Illustratively, the task may provide an interface similar to or the same as that of the object store service, and is operable to obtain input data in response to a request method call (e.g., an HTTP PUT or GET call), execute code of the task on the input data, and execute the call to the object store service to implement the request method on the resulting output data. The disadvantage of this technique is complexity. For example, in such a case, the end user may need to submit I/O requests to the on-demand code execution system rather than to the object store service to ensure execution of the tasks. Task execution may not occur if the end user submits a call directly to the object store service, and thus the owner will not be able to perform the desired data manipulation on the object collection. Furthermore, the technique may require code to author the task to provide an interface to the end user that allows processing calls to implement the request method for input data and an interface capable of executing calls from task execution to the object store service. Implementation of these network interfaces may significantly increase the complexity of the required code, thereby inhibiting the use of the technique by owners of the data sets. Furthermore, in the case where the code submitted by the user directly implements network communication, it may be necessary to change the code according to the processed request method. For example, a first set of codes may be required to support GET operations, a second set of codes may be required to support PUT operations, and so on. Because embodiments of the present disclosure alleviate the user submitted code from handling network communications, in some cases, a set of codes may be multiple requesting methods.
To address the above-described issues, embodiments of the present disclosure may enable a strong integration of serverless task execution with an interface of an object store service such that the service itself is configured to invoke task execution upon receipt of an I/O request for a data set. Furthermore, generating code to perform data manipulation may be simplified by configuring the object store service to facilitate data input and output from task execution without requiring the task execution itself to implement network communications for I/O operations. In particular, in one embodiment, the object store service and on-demand code execution system may be configured to "demonstration" input data to task execution in the form of handles to operating system level input/output streams (e.g., POSIX compliant descriptors) such that the code of a task may manipulate the input data via defined stream operations (e.g., as if the data were present in a local file system). Such stream level access to the input data may be contrasted, for example, with network level access to the input data, which generally requires the code to implement network communication to retrieve the input data. Similarly, the object store service and the on-demand code execution system may be configured to provide an output stream handle that represents an output stream to which task execution may write output. Upon detecting a write to the output stream, the object store service and the on-demand code execution system may process such write as output data for task execution and apply the invoked request method to the output data. By enabling a task to manipulate data according to input and output streams passed to the task, rather than requiring code to handle data communications over a network, the code for the task can be greatly simplified.
Another benefit of enabling tasks to manipulate data based on input and output handles is increased security. The generic on-demand code execution system may operate permissibly with respect to network communications from task execution, allowing any network communications from execution unless such communications are explicitly denied. Such a license model reflects the use of task execution as a micro-service, which typically requires interaction with various other network services. However, such a licensing model also reduces the security of the functionality, as potentially malicious network communications may also arrive at execution. In contrast to the permission model, task execution for performing data manipulation on the I/O path of the object storage system may utilize a constraint model whereby only explicitly allowed network communications can occur from the environment in which the task is executed. Illustratively, because data manipulation may occur via input and output handles, it is contemplated that many or most of the tasks used to perform data manipulation in embodiments of the present disclosure do not require network communication to occur at all, thereby greatly increasing the security of such execution. In the event that some network communication is indeed required for task execution (such as contacting an external service to assist in data manipulation), such communication may be explicitly allowed or "whitelisted" so that execution is only exposed in a strictly limited manner.
In some embodiments, the data set owner may only need to have a single data manipulation with respect to the I/O of the set. Thus, the object store service may detect I/O to the collection, implement data manipulation (e.g., by performing serverless tasks in an environment equipped with input and output handles), and apply the invoked request method to the resulting output data. In other embodiments, the owner may request multiple data manipulations about the I/O path. For example, to increase portability and reusability, an owner may author multiple serverless tasks that may be combined in different ways on different I/O paths. Thus, for each path, the owner may define a series of serverless tasks for the path to perform on the I/O. Further, in some configurations, the object storage system may provide one or more data manipulations natively. For example, the object storage system may natively accept requests for only a portion of the object (e.g., a defined byte range), or may natively implement execution of a query (e.g., an SQL query) for data of the object. In some implementations, any combination of various native manipulations and manipulations based on server-less tasks may be specified for a given I/O path. For example, for a particular request to read an object, the owner may specify that a given SQL query is to be executed on the object, the output of the query being processed via a first task execution, the output of the query being processed via a second task execution, and so on. The collection of data manipulations (e.g., native manipulations, serverless task-based manipulations, or combinations thereof) applied to an I/O path is generally referred to herein as a data processing "pipeline" applied to the I/O path.
In accordance with aspects of the present disclosure, the particular path modifications (e.g., addition pipelines) applied to an I/O path may vary depending on the attributes of the path, such as the client device from which the I/O request is issued or the object or set of objects in the request. For example, the pipeline may be applied to individual objects such that the pipeline is applied to all I/O requests of the object, or the pipeline is selectively applied only when certain client devices access the object. In some cases, the object store service may provide multiple I/O paths for an object or collection. For example, the same object or collection may be associated with multiple resource identifiers on the object storage service such that the object or collection may be accessed through multiple identifiers (e.g., uniform resource identifiers or URIs) that illustratively correspond to different network-accessible endpoints. In one embodiment, a different pipeline may be applied to each I/O path of a given object. For example, the first I/O path may be associated with non-privileged access to the data set and may therefore be affected by data manipulation that removes confidential information from the data set prior to retrieval. The second I/O path may be associated with privileged access and, therefore, is not affected by these data manipulations. In some cases, the pipeline may be selectively applied based on other criteria. For example, whether the pipeline is applied may be based on time of day, number or rate of accesses to an object or collection, and so on.
As will be appreciated by those skilled in the art in light of the present disclosure, embodiments disclosed herein increase the ability of a computing system (such as an object storage system) to provide and implement data manipulation functions for data objects. While the prior art generally relies on external implementations of data manipulation functions (e.g., requesting the user to remove personal information prior to uploading), embodiments of the present disclosure are capable of inserting data manipulation directly into the I/O path of the object storage system. Further, embodiments of the present disclosure provide a security mechanism for implementing data manipulation by providing serverless execution of manipulation functions within an isolated execution environment. Embodiments of the present disclosure further improve the operation of serverless functions by enabling such functions to operate based on local flow (e.g., "file") handles rather than requiring the functions to act as network accessible services. Thus, the presently disclosed embodiments address technical problems inherent within computing systems, such as the difficulty of implementing data manipulation at a storage system and the complexity of creating external services to implement such data manipulation. These technical problems are solved by the various technical solutions described herein, including inserting data processing pipelines into the I/O paths of objects or sets of objects without the knowledge of the requesting user, using serverless functions to perform aspects of such pipelines, and using local stream handles to enable simplified creation of serverless functions. Accordingly, the present disclosure generally represents an improvement over existing data processing systems and computing systems.
General execution of tasks on an on-demand code execution system will now be discussed. As described in detail herein, the on-demand code execution system may provide network accessible services enabling a user to submit or specify computer-executable source code to be executed by virtual machine instances on the on-demand code execution system. Each set of code on the on-demand code execution system may define a "task" and, when executed on a virtual machine instance of the on-demand code execution system, implement a particular functionality corresponding to the task. Implementing tasks individually on an on-demand code execution system may be referred to as "execution" of the task (or "task execution"). In some cases, the on-demand code execution system may enable a user to directly trigger execution of a task based on a variety of potential events, such as transmission of application programming interface ("API") calls to the on-demand code execution system or transmission of special format hypertext transfer protocol ("HTTP") packets to the on-demand code execution system. According to embodiments of the present disclosure, the on-demand code execution system may further interact with the object storage system to perform tasks during application of the data manipulation pipeline to the I/O path. Thus, an on-demand code execution system may execute any specified executable code "on demand" without requiring configuration or maintenance of the underlying hardware or infrastructure on which the code is executed. Furthermore, the on-demand code execution system may be configured to execute tasks in a fast manner (e.g., below 100 milliseconds [ ms ]) to enable execution of tasks in "real-time" (e.g., with little perceived delay by an end user). To achieve such fast execution, an on-demand code execution system may include one or more virtual machine instances that are "pre-warmed" or pre-initialized (e.g., booted into an operating system and executing a complete or substantially complete runtime environment) and configured to be able to execute user-defined code such that the code may be executed quickly in response to a request to execute the code without incurring delays from initializing the virtual machine instance. Thus, when execution of a task is triggered, code corresponding to the task can be executed in a virtual machine initialized in advance in a short time.
In particular, to perform tasks, the on-demand code execution system described herein may maintain a pool of executing virtual machine instances that are available upon receipt of a request to perform a task. Due to the pre-initialization nature of these virtual machines, the delay (sometimes referred to as latency) associated with executing task code (e.g., instance and language runtime launch times) can be significantly reduced, typically to less than 100 milliseconds. Illustratively, the on-demand code execution system may maintain a pool of virtual machine instances on one or more physical computing devices, with each virtual machine instance having one or more software components (e.g., operating system, language runtime, library, etc.) loaded thereon. When the on-demand code execution system receives a request to execute program code ("task"), the on-demand code execution system may select program code for executing the user to place into the virtual machine instance based on one or more computing constraints associated with the task (e.g., a required operating system or runtime), and cause the task to be executed on the selected virtual machine instance. The tasks may be performed in isolation containers created on the virtual machine instances, or may be performed within virtual machine instances that are isolated from other virtual machine instances that serve as environments for other tasks. Because the virtual machine instances in the pool have launched and loaded a particular operating system and language runtime upon receipt of the request, the latency associated with finding computing capacity that can handle the request (e.g., by executing user code in one or more containers created on the virtual machine instance) can be significantly reduced.
As used herein, the term "virtual machine instance" is intended to refer to execution of software or other executable code that emulates hardware to provide an environment or platform ("execution environment") on which software may execute. Virtual machine instances are typically executed by hardware devices, which may be different from the physical hardware that the virtual machine instance emulates. For example, a virtual machine may emulate a first type of processor and memory while executing on a second type of processor and memory. Thus, software intended for a first execution environment (e.g., a first operating system) may be executed on a physical device that is executing a second execution environment (e.g., a second operating system) using a virtual machine. In some cases, the hardware emulated by the virtual machine instance may be the same as or similar to the hardware of the underlying device. For example, an apparatus having a first type of processor may implement multiple virtual machine instances, each emulating an instance of the first type of processor. Thus, a device may be divided into a number of logical sub-devices using virtual machine instances (each logical sub-device is referred to as a "virtual machine instance"). While virtual machine instances may generally provide a level of abstraction of hardware away from the underlying physical device, this abstraction is not required. For example, assume that a device implements multiple virtual machine instances, each emulating the same hardware as provided by the device. In this case, each virtual machine instance may allow the software application to execute code on the underlying hardware without translation while maintaining logical separation between software applications running on other virtual machine instances. This process, commonly referred to as "native execution," may be used to increase the speed or execution of virtual machine instances. Other techniques that allow direct utilization of the underlying hardware, such as hardware pass-through techniques, may also be used.
While a virtual machine executing an operating system is described herein as one example of an execution environment, other execution environments are possible. For example, tasks or other processes may be performed within a software "container" that provides a runtime environment, but does not itself provide for virtualization of hardware. The container may be implemented within the virtual machine to provide additional security, or may be run outside of the virtual machine instance.
The foregoing aspects and many of the attendant advantages of this disclosure will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings.
FIG. 1 is a block diagram of an illustrative operating environment 100 in which a service provider system 110 operates to enable a client device 102 to perform I/O operations on objects stored within an object store service 160 and apply path modifications to such I/O operations, which modifications may include executing user-defined code on an on-demand code execution system 120.
By way of illustration, various example client devices 102 are shown in communication with a service provider system 110, including desktop computers, laptop computers, and mobile phones. In general, client device 102 may be any computing device, such as a desktop computer, a laptop or tablet computer, a personal computer, a wearable computer, a server, a Personal Digital Assistant (PDA), a hybrid PDA/mobile phone, a mobile phone, an electronic book reader, a set-top box, a voice command device, a camera, a digital media player, and so forth.
In general, the object store service 160 is operable to enable clients to read, write, modify, and delete data objects, each data object representing a set of data associated with an identifier ("object identifier" or "resource identifier") that can interact as a separate resource. For example, an object may represent a single file submitted by client device 102 (although object store service 160 may or may not store such an object as a single file). Such object-level interactions may be contrasted with other types of storage services, such as block-based storage services that provide data manipulation at the level of individual blocks or database storage services that provide data manipulation at the level of tables (or portions thereof), and so forth.
The object storage service 160 illustratively includes one or more front ends 162 that provide an interface (command line interface (CLI), application Programming Interface (API), or other programming interface) through which the client device 102 may interact with the service 160 to configure the service 160 on behalf of them and perform I/O operations on the service 160. For example, client device 102 may interact with front end 162 to create a collection of data objects (e.g., object "buckets") on service 160 and configure permissions for the collection. Thereafter, client device 102 may create, read, update, or delete objects within the collection based on the interface of front end 162. In one embodiment, front-end interface 162 provides REST-compliant HTTP supporting multiple request methods, each corresponding to a request I/O operation on service 160. By way of non-limiting example, the request method may include:
● A GET operation requesting retrieval of objects stored on the service 160 by referencing identifiers of the objects;
● PUT operation requesting that an object be stored on service 160, including an identifier of the object and input data to be stored as the object;
● A DELETE operation requesting that an object stored on the service 160 be deleted with reference to its identifier, and
● LIST operations that request that objects within a set of objects stored on the service 160 be listed by reference to the identifier of the set.
A variety of other operations may also be supported. For example, the service 160 may provide POST operations similar to PUT operations but associated with a different upload mechanism (e.g., browser-based HTML upload), or HEAD operations that are capable of retrieving metadata of an object without retrieving the object itself. In some implementations, the service 160 may implement operations that combine one or more of the above operations or combine operations with native data manipulation. For example, service 160 may provide a COPY operation that is capable of copying an object stored on service 160 to another object, which combines GET operations with PUT operations. As another example, the service 160 may provide a SELECT operation that enables specification of an SQL query to be applied to an object prior to returning the contents of the object, which combines application of the SQL query to the data object (native data manipulation) with a GET operation. As yet another example, service 160 may provide a "byte range" GET that enables GET operations on only a portion of a data object. In some cases, the operations requested by the client device 102 on the service 160 may be transmitted to the service via an HTTP request, which itself may include HTTP methods. In some cases, such as in the case of GET operations, the HTTP methods specified in the request may match the operations requested at the service 160. However, in other cases, the HTTP method of the request may not match the operation requested at the service 160. For example, the request may utilize the HTTP POST method to transmit the request to implement the SELECT operation at the service 160.
During general operation, front end 162 may be configured to obtain a call to a requesting method and apply the requesting method to input data for the method. For example, front end 162 may respond to a request to place input data as an object into service 160 by storing the input data as an object on service 160. For example, the objects may be stored on object data store 168 that corresponds to any persistent or substantially persistent storage device (including Hard Disk Drives (HDDs), solid State Drives (SSDs), network-accessible storage devices (NAS), storage Area Networks (SANs), non-volatile random access memory (NVRAM), or any of a variety of storage means known in the art). As another example, front end 162 may respond to a request for a GET object from service 160 by retrieving the object (the object representing the input data for the GET resource request) from store 168 and returning the object to requesting client device 102.
In some cases, a call to a requested method may invoke one or more native data manipulations provided by service 160. For example, a SELECT operation may provide an SQL formatted query to be applied to an object (also identified in the request), or a GET operation may provide a particular byte range of the object to be returned. The service 160 illustratively includes an object manipulation engine 170 configured to perform local data manipulation, which illustratively corresponds to a device configured with software executable to implement local data manipulation on the service 160 (e.g., by removing unselected bytes from objects for byte range GET, by applying SQL queries to objects and feeding back results of queries, etc.).
According to embodiments of the present disclosure, the service 160 may also be configured to be able to modify the I/O path of a given object or set of objects such that the requested method to be invoked is applied to the output of the data manipulation function, rather than the resources identified in the invocation. For example, the service 160 may enable the client device 102 to specify that a GET operation for a given object should be subject to execution of a user-defined task on the on-demand code execution system 120, such that the data returned in response to the operation is the output of the task execution, not the requested object. Similarly, service 160 may enable client device 102 to specify that PUT operations storing a given object should be subject to execution of user-defined tasks on-demand code execution system 120, such that data stored in response to the operations is output of task execution, rather than data provided by client device 102 for storage. As will be discussed in more detail below, path modification may include a pipeline specifying data manipulation, including native data manipulation, task-based manipulation, or a combination thereof. Illustratively, the client device 102 may specify a pipeline or other data manipulation for an object or set of objects through the front end 162, which may store a record of the pipeline or manipulation in the I/O path modification data store 164, which store 164 may represent any persistent or substantially persistent store like the data store 168. Although output differently in fig. 1, in some cases, data stores 164 and 168 may represent a single set of data stores. For example, data modifications to an object or collection may themselves be stored as objects on the service 160.
To implement data manipulation via execution of user-defined code, the system also includes an on-demand code execution system 120. In one embodiment, the system 120 may only be used by the object store service 160 in connection with data manipulation of I/O paths. In another embodiment, the system 120 is additionally accessible by the client device 102 to directly enable serverless task execution. For example, the on-demand code execution system 120 may provide one or more user interfaces, command Line Interfaces (CLIs), application Programming Interfaces (APIs), or other programming interfaces to the service 160 (and possibly the client device 102) for generating and uploading user-executable code (e.g., metadata including dependency code objects identifying uploaded code), invoking user-provided code (e.g., submitting requests to execute user code on the on-demand code execution system 120), scheduling event-based jobs or timed jobs, tracking user-provided code, or viewing other logging or monitoring information related to their requests or user code. While one or more embodiments may be described herein as using a user interface, it should be appreciated that such embodiments may additionally or alternatively use any CLI, API, or other programming interface.
Client device 102, object store service 160, and on-demand code execution system 120 may communicate via network 104, which may include any wired network, wireless network, or combination thereof. For example, the network 104 may be a personal area network, a local area network, a wide area network, an over-the-air network (e.g., for radio or television), a cable network, a satellite network, a cellular telephone network, or a combination thereof. As another example, the network 104 may be a publicly accessible network, such as the internet, that may be a linked network operated by various different parties. In some embodiments, the network 104 may be a private or semi-private network, such as a corporate or university intranet. The network 104 may include one or more wireless networks, such as a global system for mobile communications (GSM) network, a Code Division Multiple Access (CDMA) network, a Long Term Evolution (LTE) network, or any other type of wireless network. Network 104 may use protocols and components for communicating via the internet or any of the other aforementioned types of networks. For example, protocols used by network 104 may include hypertext transfer protocol (HTTP), HTTP Security (HTTPs), message Queue Telemetry Transport (MQTT), constrained application protocol (CoAP), and the like. Protocols and components for communicating via the internet or any of the other aforementioned types of communication networks are well known to those skilled in the art and, therefore, are not described in greater detail herein.
To enable interaction with the on-demand code execution system 120, the system 120 includes one or more front ends 130 that enable interaction with the on-demand code execution system 120. In the illustrative embodiment, front end 130 acts as a "front door" to other services provided by on-demand code execution system 120, thereby enabling a user (via client device 102) or service 160 to provide computer executable code, request execution of the computer executable code, and view the results of the computer executable code. Front end 130 includes various components to enable interaction between on-demand code execution system 120 and other computing devices. For example, each front end 130 may include a request interface that provides the client device 102 and service 160 with the ability to upload or otherwise communicate user-specified code to the on-demand code execution system 120 and thereafter request execution of the code. In one implementation, the request interface communicates with an external computing device (e.g., client device 102, front end 162, etc.) via a Graphical User Interface (GUI), CLI, or API. The front end 130 processes the request and ensures that the request is properly authorized. For example, the front end 130 may determine whether the user associated with the request is authorized to access the user code specified in the request.
Reference to user code, as used herein, may refer to any program code (e.g., program, routine, subroutine, thread, etc.) written in a particular programming language. In this disclosure, the terms "code," "user code," and "program code" are used interchangeably. For example, such user code may be executed in connection with particular data transformations developed by the user to achieve particular functionality. As described above, a separate set of user code (e.g., to implement a particular function) is referred to herein as a "task," while particular execution of that code (including, for example, compiling the code, interpreting the code, or otherwise making the code executable) is referred to as "task execution" or simply "execution. By way of non-limiting example, tasks may be written in JavaScript (e.g., node. Js), java, python, and/or Ruby (or another programming language).
To manage requests for code execution, front end 130 may include an execution queue that may maintain a record of requested task execution. Illustratively, the number of task executions performed concurrently by the on-demand code execution system 120 is limited, and thus new task executions initiated at the on-demand code execution system 120 (e.g., via API calls, via calls from executed or executing tasks, etc.) may be placed on the execution queue and processed, e.g., in a first-in, first-out order. In some embodiments, the on-demand code execution system 120 may include multiple execution queues, such as a separate execution queue for each user account. For example, a user of the service provider system 110 may want to limit the rate at which tasks on the on-demand code execution system 120 are performed (e.g., for cost reasons). Thus, the on-demand code execution system 120 may utilize account-specific execution queues to throttle the rate of task execution by a particular user account at the same time. In some cases, the on-demand code execution system 120 may prioritize task execution such that task execution of a particular account or specified priority bypasses the execution queue or prioritizes processing within the execution queue. In other cases, the on-demand code execution system 120 may execute a task immediately or substantially immediately after receiving a call to the task, and thus, the execution queue may be omitted.
Front end 130 may also include an output interface configured to output information regarding execution of tasks on-demand code execution system 120. Illustratively, the output interface may transmit data regarding task execution (e.g., results of the task, errors related to task execution, or details of task execution, such as total time required to complete execution, total data via execution processing, etc.) to the user computing device 102 or the object storage service 160.
In some embodiments, the on-demand code execution system 120 may include a plurality of front ends 130. In such implementations, a load balancer may be provided to distribute incoming calls to multiple front ends 130, e.g., in a round robin fashion. In some embodiments, the manner in which the load balancer distributes incoming calls to the plurality of front ends 130 may be based on the location or state of other components of the on-demand code execution system 120. For example, the load balancer may distribute calls to geographically nearby front ends 130 or front ends that have the ability to service the calls. Where each front end 130 corresponds to a separate instance of another component of the on-demand code execution system 120 (such as the activity pool 148 described below), the load balancer may distribute calls according to the capabilities or loads on those other components. In some cases, calls may be distributed deterministically among the front ends 130 such that a given call to perform a task will always (or nearly always) be routed to the same front end 130. This may, for example, assist in maintaining an accurate record of execution of the task to ensure that the task is executed only a desired number of times. For example, calls may be distributed to load balance between the front ends 130. Other distribution techniques, such as anycast routing, will be apparent to those skilled in the art.
The on-demand code execution system 120 also includes one or more worker managers 140 that manage an execution environment, such as virtual machine instance 150 (shown as VM instances 150A and 150B, commonly referred to as "VMs"), for servicing incoming calls to perform tasks. Although described below with reference to virtual machine instance 150 as an example of such an environment, embodiments of the present disclosure may utilize other environments, such as a software container. In the example shown in FIG. 1, each worker manager 140 manages an activity pool 148, which is a group (sometimes referred to as a pool) of virtual machine instances 150 executing on one or more physical host computing devices that are initialized to perform a given task (e.g., by loading code and any dependency data objects for the task into the instances).
Although virtual machine instance 150 is described herein as assigned to a particular task, in some embodiments, an instance may be assigned to a set of tasks such that the instance is bound to the set of tasks and any tasks of the set may be executed within the instance. For example, users in the same group may belong to the same security group (e.g., based on their security credentials) such that performing one task in another container on the same instance after having performed another task in the container on the particular instance 150 does not pose a security risk. As discussed below, tasks may be associated with permissions that include various aspects that control how the tasks may be performed. For example, permissions for a task may define which network connections (if any) may be initiated by the execution environment of the task. As another example, permissions for a task may define what authentication information is passed to the task, controlling which network-accessible resources may be accessed to perform the task (e.g., objects on the service 160). In one embodiment, the security group of tasks is based on one or more such permissions. For example, the security groups may be defined based on a combination of permissions to initiate network connections and permissions to access network resources. As another example, the tasks of a group may share a common dependency relationship such that the environment for executing one task of a group may be quickly modified to support execution of another task within the group.
Once the front end 130 has successfully processed the trigger event to perform the task, the front end 130 passes the request to the worker manager 140 to perform the task. In one embodiment, each front end 130 can be associated with a corresponding worker manager 140 (e.g., a worker manager 140 that is co-located or geographically near the front end 130), and thus, the front end 130 can communicate most or all of the requests to that worker manager 140. In another embodiment, the front end 130 can include a location selector configured to determine the worker manager 140 to which to pass the execution request. In one embodiment, the location selector may determine the worker manager 140 that received the call based on hashing the call and distribute the call to the worker manager 140 selected based on the hash value (e.g., via a hash ring). Various other mechanisms for distributing calls among the job managers 140 will be apparent to those skilled in the art.
Thereafter, the worker manager 140 may modify the virtual machine instance 150 (if necessary) and execute the code of the task within the instance 150. As shown in FIG. 1, the respective instances 150 may have an Operating System (OS) 152 (shown as OSs 152A and 152B), a language runtime 154 (shown as runtimes 154A and 154B), and user code 156 (shown as user code 156A and 156B). The OS152, runtime 154, and user code 156 may collectively implement execution of the user code to accomplish tasks. Thus, tasks may be performed quickly within an execution environment via operation of the on-demand code execution system 120.
In accordance with aspects of the present disclosure, each VM 150 additionally includes staging code 157 executable to facilitate staging of input data on VM 150 and to process output data written on VM 150, and VM data store 158 accessible through the VM's 150 local file system. Illustratively, the scratch code 157 represents a process executing on the VM 150 (or possibly a host device of the VM 150) and configured to obtain data from the object storage service 160 and place the data into the VM data store 158. The staging code 157 may also be configured to obtain data written to a file within the VM data store 158 and transfer the data to the object storage service 160. Because such data is available at VM data store 158, user code 156 is not required to obtain the data over the network, simplifying user code 156 and further limiting network communications through user code 156, thus increasing security. Instead, as discussed above, user code 156 may interact with input data and output data as files on VM data store 158 using file handles that are passed to code 156 during execution. In some embodiments, the input and output data may be stored as files within a kernel space file system of the data store 158. In other cases, the staging code 157 may provide a virtual file system, such as a user space file system (FUSE) interface, that provides an isolated file system that is accessible to the user code 156 such that access by the user code to the VM data store 158 is restricted.
As used herein, the term "local file system" generally refers to a file system maintained in an execution environment such that software executing within the environment may access data as files, rather than being connected via a network. According to aspects of the present disclosure, the data storage device accessible via the local file system may itself be local (e.g., a local physical storage device), or may be remote (e.g., accessed via a network protocol (such as NFS), or represented as a virtualized block device provided by a network accessible service). Thus, the term "local file system" is intended to describe the mechanism by which software accesses data, rather than the physical location of the data.
VM data store 158 may include any persistent or non-persistent data store. In one embodiment, VM data store 158 is a physical storage device of the host apparatus, or a virtual disk drive hosted on a physical storage device of the host apparatus. In another embodiment, VM data store 158 is represented as a local storage device, but is actually a virtualized storage provided by a network accessible service. For example, VM data store 158 can be a virtualized disk drive provided by a network accessible block storage service. In some embodiments, the object store service 160 may be configured to provide file-level access to objects stored on the data store 168, thereby enabling the VM data store 158 to be virtualized based on communications between the staging code 157 and the service 160. For example, the object store service 160 may include a file-level interface 166 that provides network access to objects that are files within a data store 168. For example, the file-level interface 166 may represent a network-based file system server (e.g., network File System (NFS)) that provides access to objects that are files, and the staging code 157 may implement a client of the server, thereby providing file-level access to the objects of the service 160.
In some cases, VM data store 158 may represent virtualized access to another data store executing on the same host device of VM instance 150. For example, activity pool 148 may include one or more scratch pad VM instances (not shown in FIG. 1) that may be leased together with VM instance 150 on the same host device. The data-staging VM instance may be configured to support retrieving and storing data (e.g., data objects or portions thereof, input data communicated by the client device 102, etc.) from the service 160, and storing the data on a data store of the data-staging VM instance. For example, a data staging VM instance may be designated as unavailable to support execution of user code 156 and, thus, associated with a promotion license relative to instance 150 supporting execution of user code. The data staging VM instance may make the data accessible to other VM instances 150 within its host device (or possibly on nearby host devices), such as by using a network-based file protocol (e.g., NFS). Other VM instances 150 may then act as clients of the data staging VM instance, enabling creation of virtualized VM data store 158 that appears to be a local data store from the perspective of user code 156A. Advantageously, given that the data staging VM and VM instance 150 are co-located within or on a nearby host device, it is contemplated that network-based access to data stored at the data staging VM will occur very quickly.
While some examples are provided herein with respect to reading or writing VM data store 158 using IO stream handles, IO streams may additionally be used to read or write other interfaces of VM instance 150 (while still eliminating the need for user code 156 to perform operations other than stream-level operations, such as creating a network connection). For example, the staging code 157 may "pipeline" input data as an input stream to execution of the user code 156, and the output of the execution may be "pipelined" as an output stream to the staging code 157. As another example, a staged VM instance or hypervisor of VM instance 150 may pass input data to a network port of VM instance 150 that may be read by staged code 157 and passed as an input stream to user code 157. Similarly, data written to the output stream by the task code 156 may be written to a second network port of the instance 150A for retrieval by a staged VM instance or hypervisor. In yet another example, the hypervisor of instance 150 may pass input data as data written to a virtualized hardware input device (e.g., keyboard), and scratch pad code 157 may pass to user code 156 a handle to the IO stream corresponding to the input device. The hypervisor may similarly pass to user code 156 a handle to the IO stream corresponding to the virtualized hardware output device and read the data written to that stream as output data. Thus, examples provided herein with respect to file flows may generally be modified to relate to any IO flow.
The object storage service 160 and the on-demand code execution system 120 are depicted in FIG. 1 as operating in a distributed computing environment that includes several computer systems interconnected using one or more computer networks (not shown in FIG. 1). The object storage service 160 and the on-demand code execution system 120 may also operate within a computing environment having fewer or greater numbers of devices than shown in FIG. 1. Accordingly, the depiction of the object store service 160 and the on-demand code execution system 120 in FIG. 1 should be considered illustrative and not limiting of the present disclosure. For example, the on-demand code execution system 120, or various components thereof, may implement various network service components, a hosted or "cloud" computing environment, or a peer-to-peer network configuration to implement at least a portion of the processes described herein. In some cases, the object store service 160 and the on-demand code execution system 120 may be combined into a single service. Further, the object storage service 160 and the on-demand code execution system 120 may be implemented directly in hardware or in software executed by hardware devices, and may, for example, comprise one or more physical or virtual servers implemented on physical computer hardware configured to execute computer-executable instructions to perform the various features that will be described herein. The one or more servers may be geographically dispersed or geographically co-located, for example, in one or more data centers. In some cases, one or more servers may operate as part of a system of rapidly provisioned and released computing resources (commonly referred to as a "cloud computing environment").
In the example of FIG. 1, object store service 160 and on-demand code execution system 120 are shown as being connected to network 104. In some implementations, any of the components within the object storage service 160 and the on-demand code execution system 120 can communicate with other components of the on-demand code execution system 120 via the network 104. In other embodiments, not all components of the object store service 160 and the on-demand code execution system 120 are capable of communicating with other components of the virtual environment 100. In one example, only front ends 130 and 162 (which may represent multiple front ends in some cases) may be connected to network 104, and object storage services 160 and other components of on-demand code execution system 120 may communicate with other components of environment 100 via respective front ends 130 and 162.
Although some functionality is generally described herein with reference to separate components of the object storage service 160 and the on-demand code execution system 120, other components or combinations of components may additionally or alternatively implement such functionality. For example, while the object store service 160 is depicted in FIG. 1 as including an object manipulation engine 170, the functionality of the engine 170 may additionally or alternatively be implemented as tasks on the on-demand code execution system 120. Further, while the on-demand code execution system 120 is described as an example system that applies data manipulation tasks, other computing systems may be used to perform user-defined tasks that may include more, fewer, or different components than depicted as part of the on-demand code execution system 120. In a simplified example, object store service 160 may include a physical computing device configured to perform user-defined tasks on demand, thus representing a computing system usable in accordance with embodiments of the present disclosure. Thus, the specific configuration of elements in fig. 1 is intended to be illustrative.
Fig. 2 depicts a general architecture of a front-end server 200 computing device implementing the front-end 162 of fig. 1. The general architecture of the front-end server 200 depicted in fig. 2 includes an arrangement of computer hardware and software that may be used to implement aspects of the present disclosure. The hardware may be implemented on a physical electronic device, as will be discussed in more detail below. Front-end server 200 may include more (or fewer) elements than those shown in fig. 2. However, not all of these generally conventional elements are necessarily shown to provide a viable disclosure. Additionally, the general architecture shown in FIG. 2 may be used to implement one or more of the other components shown in FIG. 1.
As shown, the front end server 200 includes a processing unit 290, a network interface 292, a computer readable medium drive 294, and an input/output device interface 296, all of which may communicate with each other via a communication bus. The network interface 292 may provide connectivity to one or more networks or computing systems. Thus, processing unit 290 may receive information and instructions from other computing systems or services via network 104. The processing unit 290 may also communicate with a main memory 280 or a secondary memory 298, and further provide output information for an optional display (not shown) via an input/output device interface 296. Input/output device interface 296 may also accept input from an optional input device (not shown).
The main memory 280 or secondary memory 298 may include computer program instructions (grouped into units in some embodiments) that are executed by the processing unit 290 in order to implement one or more aspects of the present disclosure. These program instructions are shown in fig. 2 as being included in main memory 280, but may additionally or alternatively be stored in secondary memory 298. Main memory 280 and secondary memory 298 correspond to one or more layers of memory devices including, but not limited to, RAM, 3D XPOINT memory, flash memory, magnetic storage, and the like. For purposes of this description, it is assumed that the primary memory 280 represents the primary working memory of the worker manager 140, which has a higher speed but lower total capacity than the secondary memory 298.
Main memory 280 may store an operating system 284 that provides computer program instructions for general management and operation of front-end server 200 by processing unit 290. Memory 280 may also include computer program instructions and other information for implementing aspects of the present disclosure. For example, in one embodiment, memory 280 includes a user interface unit 282 that generates a user interface (or instructions thereof) for display on a computing device, for example, via a navigation or browsing interface (such as a browser or application program) installed on the computing device.
In addition to or in combination with the user interface unit 282, the memory 280 may include a control plane unit 286 and a data plane unit 288, each of which may be executed to implement aspects of the present disclosure. Illustratively, according to embodiments of the present disclosure, the control plane unit 286 may include code executable to enable an owner of a data object or set of objects to attach a manipulation, server-less function, or data processing pipeline to an I/O path. For example, control plane unit 286 may enable front end 162 to implement the interactions of fig. 3. The data plane unit 288 may illustratively include code capable of handling I/O operations on the object store service 160, including implementing manipulations attached to the I/O path, server-less functions, or data processing pipelines (e.g., via interactions of FIGS. 5A-6B, implementations of the routines of FIGS. 7-8, etc.).
The front-end server 200 of fig. 2 is one illustrative configuration of such a device, and other configurations are possible. For example, while shown as a single device, in some embodiments, the front-end server 200 may be implemented as a plurality of physical host devices. Illustratively, a first device of such a front-end server 200 may implement the control plane unit 286, while a second device may implement the data plane unit 288.
Although depicted in fig. 2 as front-end server 200, other devices shown in environment 100 of fig. 1 may be implemented with similar components in some embodiments. For example, a similar device may implement worker manager 140 as described in more detail in U.S. patent No. 9,323,556 ("the 556 patent") entitled "PROGRAMMATIC EVENT DETECTION AND MESSAGE GENERATION FOR REQUESTS TO EXECUTE PROGRAM CODE" filed on 9 months 30 in 2014, the entire contents of which are incorporated herein by reference.
Referring to FIG. 3, an illustrative interaction for enabling client device 102A to modify an I/O path of one or more objects on object store service 160 by inserting data manipulation into the I/O path, the manipulation being implemented in tasks that may be performed on-demand code execution system 120.
The interaction of FIG. 3 begins at (1), where the client device 102A authors stream manipulation code. Code may illustratively be used to access an input file handle provided at program execution (e.g., the input file handle may be represented by a standard input stream of the program, typically "stdin"), perform manipulations on data obtained from the file handle, and write data to an output file handle provided at program execution (e.g., the output file handle may be represented by a standard output stream of the program, typically "stdout").
Although examples are discussed herein with respect to a "file" handle, embodiments of the present disclosure may utilize handles that provide access to any operating system level input/output (IO) stream, examples of which include byte streams, character streams, file streams, and the like. As used herein, the term operating system level input/output stream (or simply "IO stream") is intended to refer to a data stream for which an operating system provides a defined set of functions, such as finding, reading and writing streams within the stream. The stream may be created in various ways. For example, the programming language may generate a stream using a library of functions to open a file on the native operating system, or may create a stream using "pipe" operators (e.g., within an operating system shell command language). As will be appreciated by those skilled in the art, most common programming languages include the ability to interact with streams as the basic functionality of code.
According to embodiments of the present disclosure, task code may be authored to accept an input handle and an output handle as parameters of the code, both representing IO streams (e.g., input and output streams, respectively). The code may then manipulate the data of the input stream and write the output to the output stream. In the case of using a general-purpose programming language, any one of a variety of functions may be implemented according to the needs of a user. For example, one function may search for and delete confidential information from an input stream. While some codes may use only input and output handles, other codes may implement additional interfaces, such as a network communication interface. However, by providing code with access to input and output streams created outside of the code (via corresponding handles), no code is required to create such streams. Furthermore, because the stream may be created outside of the code and possibly outside of the code's execution environment, the stream operation code does not necessarily need to be trusted to perform certain operations that may be necessary to create the stream. For example, a stream may represent information transmitted over a network connection without providing code with access to the network connection. Thus, using IO streams to transfer data into and out of code execution may simplify code while increasing security.
As described above, code can be authored in a variety of programming languages. Authoring tools for such languages are known in the art and therefore will not be described herein. Although authoring is described in fig. 3 as occurring on the client device 102A, in some cases, the service 160 may provide an interface (e.g., a web GUI) through which code is authored or selected.
At (2), the client device 102A submits the stream manipulation code to the front end 162 of the service 160 and requests that execution of the code be inserted into the I/O path of the one or more objects. Illustratively, front end 162 may provide one or more interfaces to device 102A that are capable of submitting code (e.g., as a compressed file). Front end 162 may also provide an interface capable of specifying one or more I/O paths to which execution of code should be applied. For example, each I/O path may correspond to an object or collection of objects (e.g., a "bucket" of objects). In some cases, the I/O path may also correspond to a given manner of accessing such an object or collection (e.g., a URI through which the object was created), to one or more accounts attempting to access the object or collection, or to other path criteria. The designation of path modification is then stored in the I/O path modification data store 164 at (3). Additionally, at (4), the stream manipulation code is stored within the object data store 166.
Thus, when an I/O request is received via a specified I/O path, service 160 is configured to execute stream manipulation code against the input data of the request (e.g., the data provided by client device 102A or the object of service 160, depending on the I/O request), and then apply the request to the output of the code execution. In this way, the client device 102A (which illustratively represents the owner of an object or set of objects in fig. 3) may gain greater control over the data stored on and retrieved from the object store service 160.
The interactions of FIG. 3 generally involve inserting a single data manipulation into an I/O path of an object or collection on the service 160. However, in some embodiments of the present disclosure, the owner of an object or collection is able to insert multiple data manipulations into such an I/O path. Each data manipulation may correspond to, for example, a serverless code-based manipulation or a native operation of the service 160. For example, assume that an owner has submitted a data set as an object to service 160, and that the owner wishes to provide a filtered view of a portion of the data set to an end user. While the owner may store the filtered view of the portion as a separate object and provide the end user with access to the separate object, this may result in duplication of data on the service 160. If an owner wishes to provide different portions of a data set to multiple end users, which may have custom filters, the data is repeatedly added, resulting in significant inefficiency. Another option, in accordance with the present disclosure, may be for the owner to author or obtain custom code to implement different filters on different portions of the object, and insert the code into the I/O path of the object. However, this approach may require some native functionality of the owner replication service 160 (e.g., the ability to retrieve a portion of the data set). Furthermore, this approach may inhibit modularity and reusability of the code because a single set of code is required to perform both functions (e.g., selecting a portion of the data and filtering that portion).
To address these shortcomings, embodiments of the present disclosure enable owners to create a data manipulation pipeline to be applied to an I/O path, linking multiple data manipulations together, each of which may also be inserted into other I/O paths. An illustrative visualization of such a pipeline is shown in fig. 4 as pipeline 400. In particular, pipeline 400 illustrates that a series of data manipulations specified by an owner occurs when a request method is invoked for an object or set of objects. As shown in FIG. 4, the pipeline begins with input data that is specified within the call according to the called request method. For example, a PUT call may typically include input data as the data to be stored, while a GET call may typically include input data by referencing a stored object. LIST calls may specify a directory whose manifest is the input data for the LIST request method.
In contrast to typical implementations of the request method, in the illustrative pipeline 400, the called request method is not initially applied to the input data. Instead, execution of the input data is passed to 404 initially "code A", where code A represents the first set of user authored code. The output of this execution is then passed to a "native function A"406, which illustratively represents a native function of the service 160, such as a "SELECT" or byte range function implemented by the object manipulation engine 170. The output of the native function 406 is then passed to execution 408 of "code B," which represents a second set of user authored code. Thereafter, the output of the execution 408 is passed to 410 (e.g., GET, PUT, LIST, etc.) by the call request method. Thus, rather than applying the request method to the input data as in conventional techniques, in the illustration of FIG. 4 the request method is applied to the output of an execution 408 that illustratively represents a transformation of the input data in accordance with one or more owner-specified manipulations 412. It should be noted that the implementation of the pipeline 400 may not require any action or implication as to any knowledge of the pipeline 400 with respect to the calling client device 102. Thus, it is contemplated that the implementation of the pipeline will not affect existing mechanisms for interacting with the service 160 (except for changing the data stored on or retrieved from the service 160 according to the pipeline). For example, it is contemplated that pipelined implementations do not require reconfiguration of existing programs that use the APIs of the service 160.
Although the pipeline 400 of fig. 4 is linear, in some embodiments, the service 160 may enable an owner to configure a non-linear pipeline, such as by including conditional or branching nodes within the pipeline. Illustratively, as described in more detail below, data manipulation (e.g., server-less based functions) can be configured to include return values, such as an indication of successful execution, encountering an error, and so forth. In one example, the return value of the data manipulation may be used to select a conditional branch within the branch pipeline such that a first return value causes the pipeline to continue on the first branch and a second return value causes the pipeline to continue on the second branch, hi some cases, the pipeline may include parallel branches such that data is replicated or partitioned into multiple data manipulations whose outputs are passed to a single data manipulation for merging before the called method is executed. The service 160 may illustratively provide a graphical user interface through which an owner may create a pipeline, such as by specifying nodes within the pipeline and linking the nodes together via logical connections. A variety of stream-based development interfaces are known and may be used in connection with aspects of the present disclosure.
Furthermore, in some implementations, a pipeline applied to a particular I/O path may be generated on the fly upon request based on data manipulation applied to the path according to different criteria. For example, an owner of a data set may apply a first data manipulation to all interactions with objects within the set, and a second data manipulation to all interactions obtained via a given URI. Thus, upon receiving a request to interact with an object within the collection via a given URI, the service 160 may generate a pipeline that combines the first data manipulation and the second data manipulation. The service 160 may illustratively implement a standard hierarchy such that manipulations applied to objects are placed within the pipeline prior to manipulations applied to URIs or the like.
In some implementations, the client device 102 may be able to request that data manipulation be included within the pipeline. For example, within the parameters of the GET request, the client device 102 may specify a particular data manipulation to be included within the pipeline of the combined request application. Illustratively, the collection owner may specify one or more data manipulations that are allowed for the collection, and further specify identifiers (e.g., function names) for those manipulations. Thus, when a request interacts with a collection, the client device 102 may specify an identifier to cause the manipulation to be included within a pipeline applied to the I/O path. In one embodiment, manipulation of client requests is appended to the end of the pipeline after owner-specified manipulation of data and before the requested request method is implemented. For example, in the case where the client device 102 requests to obtain a data set and requests to apply a search function to the data set prior to implementing the GET method, the search function may receive as input data an output of a data manipulation specified for an owner of the data set (e.g., a manipulation to remove confidential information from the data set). Additionally, in some implementations, the request may specify parameters to be passed to one or more data manipulations (whether specified within the request or not). Thus, while embodiments of the present disclosure may implement data manipulation without knowledge of those manipulations with respect to the client device 102, other embodiments may enable the client device 102 to communicate information within an I/O request for implementing the data manipulation.
Furthermore, while example embodiments of the present disclosure have been discussed with respect to manipulation of input data of a called method, embodiments of the present disclosure may also be used to modify aspects of a request, including a called method. For example, the serverless task executes the content (including, for example, the invoked method and parameters) that may be passed the request, and is configured to modify and return a modified version of the method or parameters as a return value to the front end 162. Illustratively, in the event that the client device 102 is authenticated as a user having access to only a portion of the data join, the serverless task execution may be passed a call to the "GET" data object, and the parameters of the GET request may be transformed such that it applies only to a particular byte range of the data set corresponding to the portion accessible to the user. As yet another example, tasks may be used to enable custom parsing or restriction of called methods, such as by restricting methods that a user may call, parameters of those methods, and so forth. In some cases, applying one or more functions to a request (e.g., modifying a called method or method parameter) may be considered a "pre-data processing" pipeline, and thus may be implemented prior to obtaining input data within pipeline 400 (which may vary due to a change in the request), or may be implemented independent of data manipulation pipeline 400.
Similarly, while example embodiments of the present disclosure are discussed with respect to applying a called method to output data of one or more data operations, in some embodiments, manipulation may additionally or alternatively occur after the application is called the method. For example, a data object may contain sensitive data that a data owner wishes to remove before providing the data to a client. The owner may also enable the client to specify native operations on the data set, such as database queries (e.g., via SELECT resource methods) on the data set. While an owner may specify a pipeline for a data set to cause filtering of sensitive data to be performed prior to application of the SELECT method, such an order of operations may be undesirable because filtering may be performed with respect to the entire data object, not just the portion returned by the SELECT query. Thus, in addition to or instead of specifying manipulations that occur before a request method is satisfied, embodiments of the present disclosure may enable an owner to specify manipulations that occur after an application is called a method but before a final operation is performed to satisfy a request. For example, in the case of a SELECT operation, the service 160 may first perform the SELECT operation on specified input data (e.g., data objects) and then pass the output of the SELECT operation to a data manipulation, such as a serverless task execution. The output of the execution may then be returned to the client device 102 to satisfy the request.
While fig. 3 and 4 are generally described with reference to serverless tasks authored by the owners of an object or collection, in some cases, service 160 may enable code authors to share their tasks with other users of service 160 such that the code of a first user is executed in the I/O path of an object owned by a second user. The service 160 may also provide a task library for each user to use. In some cases, the code of the shared task may be provided to other users. In other cases, the code of the shared task may be hidden from other users so that other users may perform the task but cannot view the code of the task. In these cases, other users may illustratively be able to modify certain aspects of code execution, such as permissions according to which code is to be executed.
Referring to fig. 5A and 5B, illustrative interactions for applying modifications to an I/O path for a request, referred to as a "PUT" request or "PUT object call" in connection with these figures, to store an object on service 160 will be discussed. Although shown in both figures, the interactive numbering is maintained in fig. 5A and 5B.
Interactions begin at (1) where client device 102A submits a PUT object call to storage service 160 that corresponds to a request to store input data (e.g., included or specified in the call) on service 160. For example, the input data may correspond to a file stored on the client device 102A. As shown in FIG. 5A, the call is directed to the front end 162 of the service 162, which retrieves an indication of modification of the I/O path for the call from the I/O path modification data store 164 at (2). The indication may reflect, for example, a pipeline to be applied to a call received on the I/O path. The I/O path for the call may generally be specified with respect to the requesting method included in the call, the object or set of objects indicated in the call, the particular mechanism (e.g., protocol, URI used, etc.) of the proximity service 160, the identity or authentication state of the client device 102A, or a combination thereof. For example, in FIG. 5A, the I/O path used may correspond to using a PUT request method directed to a particular URI (e.g., associated with front end 162) to store an object in a particular logical location (e.g., a particular bucket) on service 160. In fig. 5A and 5B, it is assumed that the owner of the logical location has previously specified modifications to the I/O path, and in particular, has specified that a serverless function should be applied to the input data, and then the results of that function are stored in the service 160.
Thus, at (3), front end 162 detects the inclusion of the execution of the serverless task within the modification of the I/O path. Thus, at (4), the front end 162 submits the call to the on-demand code execution system 120 to perform the task specified in the modification for the input data specified in the call.
At (5), the on-demand code execution system 120 thus generates an execution environment 502 in which code corresponding to the task is executed. Illustratively, the call may be directed to the front end 130 of the system, which may issue instructions to the work manager 140 to select or generate a VM instance 150 in which to execute the task, the VM instance 150 illustratively representing the execution environment 502. During generation of the execution environment 502, the system 120 also provides the environment with code 504 (which may be retrieved, for example, from the object data store 166) for tasks indicated in the I/O path modification. Although not shown in fig. 5A, environment 502 also includes other dependencies of code, such as access to an operating system, a runtime required to execute the code, and so forth.
In some embodiments, the generation of the execution environment 502 may include configuring the environment 502 with security constraints that limit access to network resources. Illustratively, where a task intends to manipulate data without reference to network resources, environment 502 may be configured to be unable to send or receive information via a network. In the event that a task intends to utilize network resources, access to such resources may be provided based on a "whitelist" such that network communications from environment 502 are allowed only for specified domains, network addresses, and the like. The network limitations may be implemented, for example, by a host device hosting environment 502 (e.g., by a hypervisor or host operating system). In some cases, network access requirements may be utilized to facilitate the logical or physical placement of the environment 502. For example, where a task does not require access to network resources, the environment 502 of the task may be placed on host devices remote from other network-accessible services of the service provider system 110, such as "edge" devices having lower quality communication channels to those services. In the event that a task requires access to an otherwise private network service, such as a service implemented in a virtual private cloud (e.g., a local area network-like environment implemented on service 160 on behalf of a given user), environment 502 may be created to logically exist within the cloud such that task execution 502 accesses resources within the cloud. In some cases, the tasks may be configured to execute within the private cloud of the client device 102 that submitted the I/O request. In other cases, the task may be configured to execute within a private cloud of the owner of the object or collection referenced in the request.
In addition to generating environment 502, at (6), system 120 provides the environment with stream-level access to an input file handle 506 and an output file handle 508, which may be used to read and write input data and output data, respectively, for task execution. In one implementation, file handles 506 and 508 may point to (physical or virtual) block storage (e.g., disk drives) attached to environment 502 so that tasks may interact with the local file system to read input data and write output data. For example, environment 502 may represent a virtual machine having a virtual disk drive, and system 120 may obtain input data from service 160 and store the input data on the virtual disk drive. Thereafter, upon executing the code, the system 120 may pass to the code a handle to the input data as stored on the virtual disk drive and a handle to a file on the drive to which the output data is written. In another embodiment, file handles 506 and 508 may point to a network file system, such as an NFS-compliant file system, on which the input data has been stored. For example, during processing calls, front end 162 may store input data as objects on object data store 166, and file-level interface 166 may provide file-level access to the input data and files representing output data. In some cases, file handles 506 and 508 may point to files on a virtual file system (such as a user space file system). By providing handles 506 and 508, the task code 504 can use stream manipulation to read input data and write output data, rather than requiring network transport to be implemented. Creation of handles 506 and 508 (or a stream corresponding to a handle) may illustratively be accomplished by executing scratch pad code 157 within or associated with environment 502.
The interaction of FIG. 5A continues in FIG. 5B, where system 120 executes task code 504. Since the task code 504 may be user authored, any number of functions may be implemented within the code 504. However, for purposes of describing fig. 5A and 5B, it is assumed that code 504, when executed, reads input data from input file handle 506 (which may be passed as a common input stream such as stdin), manipulates the input data, and writes output data to output file handle 508 (which may be passed as a common output stream such as stdout). Thus, at (8), the system 120 obtains data written to the output file (e.g., the file referenced in the output file handle) as the output data for execution. Additionally, at (9), the system 120 obtains a return value for the code execution (e.g., a value passed in the final call of the function). For the purposes of describing fig. 5A and 5B, it will be assumed that the return value indicates the success of the execution. At (10), the output data and the successful return value are then passed to the front end 162.
Although shown as a single interaction in fig. 5B, in some embodiments, the output data of the task execution and the return value of the execution may be returned separately. For example, during execution, the task code 504 may write to the output file through the handle 508, and the data may be returned to the service 160 periodically or iteratively. Illustratively, where the output file exists on a file system in the user space implemented by the staging code, the staging code may detect each write to the output file and forward it to the front end 162. In the case where the output file exists on a network file system, writing to the file may directly cause the written data to be transferred to interface 166 and thus to service 160. In some cases, iteratively transmitting the written data may reduce the amount of memory required locally to environment 502, as the written data may be deleted from the local storage device of environment 502 according to some embodiments.
In addition, while successful return values are assumed in fig. 5A and 5B, other types of return values are possible and contemplated. For example, an error return value may be used to indicate to front end 162 that an error occurred during execution of task code 504. As another example, a user-defined return value may be used to control how conditional branches within the pipeline proceed. In some cases, the return value may indicate to the front end 162 a request for further processing. For example, task execution may return a call to front end 162 to execute another serverless task (which may not be specified in the path modification of the current I/O path). Further, the return value may specify to front end 162 what return value to return to client device 102A. For example, a typical PUT request method invoked at service 160 may expect to return an HTTP 200 code ("OK"). Thus, a successful return value from the task code may further indicate that the front end 162 should return the HTTP 200 code to the client device 102A. For example, the error return value may indicate that front end 162 should return a 3XX HTTP redirect or 4XX HTTP error code to client device 102A. Still further, in some cases, the return value may be assigned to the content of the return message to client device 102A by front end 162 instead of the return value. For example, front end 162 may be configured to return a given HTTP code (e.g., 200) for any request from client device 102A that was successfully retrieved at front end 162 and invoked the data processing pipeline. The task execution may then be configured to specify within its return value data to be transferred to the client device 102A in addition to the HTTP code. Such data may illustratively include structured data (e.g., extensible markup language (XML) data) that provides information generated by task execution, such as data indicating success or failure of a task. Such an approach may advantageously enable the front end 162 to quickly respond to requests (e.g., without waiting for execution of a task) while still enabling task execution to communicate information to the client device 102.
For purposes of this description, it will be assumed that the success return value of the task indicates that an HTTP 2XX success response should be delivered to device 102A. Accordingly, upon receiving the output data, the front end 162 stores the output data as an object within the object data store 166 (11). Interaction (11) illustratively corresponds to the implementation of a PUT request method that is initially invoked by client device 102A, but stores the output of task execution rather than the input data provided. After implementing the invoked PUT request method, at (12), front end 162 returns a success indicator (e.g., HTTP 200 response code) to client device 102A indicated by the success return value of the task. Thus, from the perspective of client device 102A, a call to store an object PUT on service 160 results in the creation of the object on service 160. However, rather than storing input data provided by the device 102A, the object stored on the service 160 corresponds to output data of an owner-specified task, thereby enabling the owner of the object to better control the content of the object. In some use cases, service 160 may additionally store input data as objects (e.g., where the owner-specified tasks correspond to code executable to provide output data that may be used in connection with input data, such as a checksum generated from the input data).
Referring to fig. 6A and 6B, illustrative interactions for applying modifications to an I/O path for a request, referred to as a "GET" request or "GET call" in connection with these figures, to retrieve an object on a service 160 will be discussed. Although shown in both figures, the interactive numbering is maintained in fig. 6A and 6B.
The interaction begins at (1) where the client device 102A submits a GET call to the storage service 160 corresponding to a request to obtain data for an object (identified in the call) stored on the service 160. As shown in FIG. 6A, the call is directed to the front end 160 of the service 162, which retrieves an indication of modification of the I/O path for the call from the I/O path modification data store 164 at (2). For example, in FIG. 6A, the I/O path used may correspond to an object that uses a GET request method directed to a particular URI (e.g., associated with front end 162) to retrieve a particular logical location (e.g., a particular bucket) on service 160. In fig. 6A and 6B, it is assumed that the owner of the logical location has previously specified modifications to the I/O path, and in particular, has specified that the serverless function should be applied to the object, and then returns the result of the function as the requested object to the device 102A.
Thus, at (3), front end 162 detects the inclusion of the execution of the serverless task within the modification of the I/O path. Thus, at (4), the front end 162 submits the call to the on-demand code execution system 120 to execute the task specified in the modification for the object specified in the call. At (5), the on-demand code execution system 120 thus generates an execution environment 502 in which code corresponding to the task is executed. Illustratively, the call may be directed to the front end 130 of the system, which may issue instructions to the work manager 140 to select or generate a VM instance 150 in which to execute the task, the VM instance 150 illustratively representing the execution environment 502. During generation of the execution environment 502, the system 120 also provides the environment with code 504 (which may be retrieved, for example, from the object data store 166) for tasks indicated in the I/O path modification. Although not shown in fig. 6A, environment 502 also includes other dependencies of code, such as access to an operating system, a runtime required to execute the code, and so forth.
In addition, at (6), the system 120 provides the environment with file-level access to an input file handle 506 and an output file handle 508, which may be used to read and write input data (objects) and output data, respectively, for task execution. As discussed above, file handles 506 and 508 may point to (physical or virtual) block storage (e.g., disk drives) attached to environment 502 so that tasks may interact with the local file system to read input data and write output data. For example, environment 502 may represent a virtual machine with a virtual disk drive, and at (6'), system 120 may obtain the object referenced in the call from service 160 and store the object on the virtual disk drive. Thereafter, upon executing the code, the system 120 may pass to the code a handle to an object, such as that stored on a virtual disk drive, and a handle to a file on the drive to which the output data is written. In another embodiment, file handles 506 and 508 may point to a network file system on which the object has been stored, such as an NFS-compliant file system. For example, the file-level interface 166 may provide file-level access to objects as stored in an object data store, as well as files representing output data. By providing handles 506 and 508, the task code 504 can use stream manipulation to read input data and write output data, rather than requiring network transport to be implemented. Creation of handles 506 and 508 may illustratively be accomplished by executing scratch pad code 157 within or associated with environment 502.
The interaction of FIG. 6A continues in FIG. 6B, where at (7) the system 120 executes the task code 504. Since the task code 504 may be user authored, any number of functions may be implemented within the code 504. However, for purposes of describing fig. 6A and 6B, it is assumed that code 504, when executed, reads input data (corresponding to an object identified within a call) from input file handle 506 (which may be passed as a common input stream such as stdin), manipulates the input data, and writes output data to output file handle 508 (which may be passed as a common output stream such as stdout). Thus, at (8), the system 120 obtains data written to the output file (e.g., the file referenced in the output file handle) as the output data for execution. Additionally, at (9), the system 120 obtains a return value for the code execution (e.g., a value passed in the final call of the function). However, for the purposes of describing fig. 6A and 6B, it will be assumed that the return value indicates the success of the execution. At (10), the output data and the successful return value are then passed to the front end 162.
Upon receiving the output data and the return value, the front end 162 returns the output data of the task execution as the object of the request. Thus, interaction (11) illustratively corresponds to the implementation of a GET request method that is initially invoked by client device 102A, but returns the output of the task execution rather than the object specified within the invocation. From the perspective of the client device 102A, therefore, a call to a GET object from the storage service 160 results in the return of data as an object to the client device 102A. However, rather than returning the object as stored on the service 160, the data provided to the client device 102A corresponds to output data of the owner-specified task, thereby enabling the owner of the object to better control the data returned to the client device 102A.
Similar to as discussed above with respect to fig. 5A and 5B, although shown as a single interaction in fig. 6B, in some embodiments, the output data of a task execution and the return value of the execution may be returned separately. In addition, while successful return values are assumed in fig. 6A and 6B, other types of return values are possible and other types of return values are contemplated, such as error values, pipeline control values, or calls to perform other data manipulations. Further, the return value may indicate what return value to return to client device 102A (e.g., as an HTTP status code). In some cases, where output data is returned iteratively from task execution, the output data may also be provided iteratively by front end 162 to client device 102A. Where the output data is large (e.g., on the order of hundreds of megabytes, gigabytes, etc.), iteratively returning the output data to the client device 102A may enable the data to be provided as a stream, thereby expediting delivery of content to the client device 102A relative to delaying return of the data until execution of the task is completed.
While illustrative interactions are described above with reference to fig. 5A-6B, various modifications to these interactions are possible and contemplated herein. For example, while the interactions described above involve manipulation of input data, in some embodiments, serverless tasks may be inserted into the I/O path of the service 160 to perform functions other than data manipulation. Illustratively, the serverless task may be utilized to perform verification or authorization of the invoked requested method to verify that the client device 102A is authorized to perform the method. Task-based authentication or authorization may support functionality that is not provided locally by the service 160. For example, consider that it is desirable to limit certain client devices 102 to only access collection owners of objects in a collection created during a certain time frame (e.g., last 30 days, any time other than last 30 days, etc.). While the service 160 may provide authorization on a per object or per collection basis, in some cases, the service 160 may not provide authorization on a per-object or per-collection basis. Thus, embodiments of the present disclosure enable an owner to insert a serverless task into an I/O path of a collection (e.g., a GET path using a given URI of the collection) that determines whether a client is authorized to retrieve a requested object based on the creation time of the object. Illustratively, the return value provided by execution of the task may correspond to an "authorized" or "unauthorized" response. In the case where the task does not perform data manipulation, it may not be necessary to provide an input stream handle and an output stream handle for the environment in which the task is executing. Thus, in these cases, the service 160 and the system 120 may be configured to forgo providing such handles to the environment. For example, it may be specified at the time the task is created whether the task implements data manipulation and stored as metadata for the task (e.g., within object data store 166). Thus, the service 160 may determine from the metadata whether data manipulation within the task should be supported by providing the appropriate stream handle.
While some embodiments may utilize return values without using a flow handle, other embodiments may instead utilize flow handles without using return values. For example, while the interactions described above involve providing a return value for task execution to the storage service 160, in some cases, the system 120 may be configured to detect completion of a function based on interactions with the output stream handle. Illustratively, the scratch code within the environment (e.g., providing a user space file system or network-based file system) may detect a call to a deallocated stream handle (e.g., by calling a "file. Close ()" function, etc.). The staging code may interpret such a call as a successful completion of the function and notify the service 160 of the successful completion without requiring task execution to explicitly provide a return value.
While the interactions described above generally involve passing input data to task execution, additional or alternative information may be passed to execution. By way of non-limiting example, such information may include content of the request (e.g., HTTP data transmitted) from the client device 102, metadata about the request (e.g., a network address from which the request was received or a time of the request), metadata about the client device 102 (e.g., an authentication status of the device, an account time, or a request history), or metadata about an object or collection of requests (e.g., a size, a storage location, a time of permission or creation, modification, or access). Further, in addition to or in lieu of manipulation of the input data, task execution may be configured to modify metadata about the input data, which may be stored with the input data (e.g., within the object) and thus written through the output stream handle, or may be stored separately and thus modified through the metadata stream handle, inclusion of metadata in the return value, or a separate network transmission to the service 160.
With reference to FIG. 7, an illustrative routine 700 for implementing an owner-defined function associated with an I/O request obtained at the object storage service of FIG. 1 via an I/O path will be described. The routine 700 may illustratively be implemented after an I/O path (e.g., defined in terms of an object or collection, an access mechanism to an object or collection (such as a URI), an account to transmit IO requests, etc.) is associated with a pipeline of data manipulation. For example, routine 700 may be implemented prior to the interactions of FIG. 3 discussed above. Routine 700 is illustratively implemented by front end 162.
The routine 700 begins at block 702, where the front end 162 obtains a request to apply an I/O method to input data. The request illustratively corresponds to a client device (e.g., an end-user device). The I/O method may correspond to, for example, an HTTP request method, such as GET, PUT, LIST, DELETE, etc. The input data may be included in the request (e.g., in a PUT request) or referenced in the request (e.g., as an existing object on object store service 160).
At block 704, front end 162 determines one or more data manipulations in the I/O path for the request. As described above, the I/O path may be defined based on various criteria (or combinations thereof), such as the object or collection referenced in the request, the URI through which the request is transmitted, the account associated with the request, and so forth. The manipulation for each defined I/O path may illustratively be stored in object store service 160. Thus, at block 704, the front end 162 may compare the parameters for the requested I/O path with the stored data manipulation at the object storage service 160 to determine the data manipulation inserted into the I/O path. In one embodiment, the manipulation forms a pipeline, such as pipeline 400 of FIG. 4, that may be previously stored or constructed by front end 162 at block 704 (e.g., by combining multiple manipulations applied to the I/O path). In some cases, additional data manipulation may be specified within the request, e.g., the data manipulation may be inserted prior to a pre-specified data manipulation (e.g., not specified within the request). In other cases, the request may exclude references to any data manipulation.
At block 706, front end 162 passes the input data of the I/O request to the initial data manipulation of the I/O path. The initial data manipulation may include, for example, native operations of the object store service 160 or server-less tasks defined by the owners of the objects or collections referenced in the call. Illustratively, where the initial data manipulation is a native manipulation, the front end 162 may pass the input to the object manipulation engine 170 of fig. 1. Where the initial data operation is a serverless task, front end 162 may pass the input to on-demand code execution system 120 of FIG. 1 for processing via execution of the task. An illustrative routine for implementing the serverless task is described below with reference to FIG. 8.
Although FIG. 7 illustratively depicts data manipulation, in some cases, owners may apply other processing to the I/O path. For example, the owner may insert a serverless task into the I/O path for the object or collection that provides authentication independent of data manipulation. Thus, in some embodiments, block 706 may be modified such that other data (such as metadata about the request or the object specified in the request) is passed to an authentication function or other path manipulation.
Thereafter, the routine 700 proceeds to block 708, where the implementation of the routine 700 changes depending on whether additional data manipulation is associated with the I/O path. If so, the routine 700 proceeds to block 710, where the output of the previous manipulation is passed to the next manipulation associated with the I/O path (e.g., a subsequent stage of the pipeline).
After block 710, the routine 700 then returns to block 708 until there are no additional manipulations to be implemented. The routine 700 then proceeds to block 712, where the front end 162 applies the called I/O method (e.g., GET, PUT, POST, LIST, DELETE, etc.) to the previously manipulated output. For example, front end 162 may provide an output as a result of a GET or LIST request, or may store an output as a result of a PUT or POST request as a new object. The front end 162 may also provide a response to the request to the requesting device, such as an indication of the success of the routine 700 (or the failure of the routine in the case of a failure). In one embodiment, the response may be determined by a return value provided by the data manipulation implemented at block 706 or 710 (e.g., a final operation implemented before the error or success). For example, an operation indicating an error (e.g., lack of authorization) may specify an HTTP code indicating the error, while a successfully performed manipulation may instruct front end 162 to return an HTTP code indicating success, or may instruct front end 162 to return code otherwise associated with applying an I/O method (e.g., without data manipulation). Thereafter, routine 700 ends at block 714.
It should be noted that applying the invoked method to the output may change the data stored in or retrieved from the object store service 160, as opposed to the input specified in the initial request. For example, data stored as objects on service 160 may be different from data submitted in a request to store such data. Similarly, data retrieved as objects from the system may not match objects stored on the system. Thus, the implementation of routine 700 enables an owner of a data object to assert greater control over the I/O of the object or collection stored on the object storage service 160 on behalf of the owner.
In some cases, additional or alternative blocks may be included in the routine 700, or implementations of such blocks may include additional or alternative operations. For example, as discussed above, no server task execution may provide a return value in addition to or in lieu of providing output data. In some cases, the return value may indicate that front end 162 takes additional action in effecting the maneuver. For example, the error return value may instruct the front end 162 to cease implementing the manipulation and provide the specified error value (e.g., HTTP error code) to the requesting device. Another return value may instruct the front end 162 to implement additional serverless tasks or manipulations. Thus, in some cases, routine 700 may be modified to include processing the return values of previous operations, e.g., after blocks 706 and 710 (or block 708 may be modified to include processing such values). Thus, routine 700 is intended to be illustrative in nature.
With reference to FIG. 8, an illustrative routine 800 will be described for performing tasks on the on-demand code execution system of FIG. 1 to implement data manipulation during implementation of the owner-defined function. The routine 800 is illustratively implemented by the on-demand code execution system 120 of FIG. 1.
Routine 800 begins at block 802, where system 120 obtains a call to implement a stream manipulation task (e.g., a task that manipulates data provided as an input IO stream handle). For example, the call may be obtained in connection with block 706 or 710 of routine 700 of FIG. 7. The call may include input data for the task as well as other metadata, such as metadata of the request prior to the call, metadata of the object referenced within the call, and so forth.
At block 804, the system 120 generates an execution environment for the task. The generation of an environment may include, for example, generating a container or virtual machine instance in which a task may execute, and providing the environment with code of the task and any dependencies of the code (e.g., runtime, library, etc.). In one embodiment, the environment is generated with network permissions corresponding to permissions specified for the task. As discussed above, such permissions may be set restrictively (rather than permissibly), e.g., according to a whitelist. Thus, if the owner of the I/O path does not specify permissions, the environment may lack network access. Because tasks operate to manipulate streams, rather than network data, such a restrictive model may improve security without adversely affecting functionality. In some implementations, the environment can be generated at a logical network location that provides access to otherwise limited network resources. For example, an environment may be generated within a virtual private local area network (e.g., a virtual private cloud environment) associated with the calling device.
At block 806, the system 120 temporarily stores the IO stream representing the input data to the environment. Illustratively, the system 120 may configure the environment with a file system that includes the input data and pass a handle to the task code that allows access to the input data as a filestream. For example, system 120 may configure an environment with a network file system to provide network-based access to input data (e.g., stored on an object storage system). In another example, the system 120 may configure the environment with a "local" file system (e.g., from the perspective of an operating system providing the file system) and copy the input data to the local file system. For example, the local file system may be a user space file system (FUSE). In some cases, the local file system may be implemented on a virtualized disk drive provided by the host device of the environment or by a network-based device (e.g., as a network-accessible block storage device). In other embodiments, system 120 may provide IO streams by "pipelining" input data to the execution environment, by writing input data to a web socket of the environment (which may not provide access to an external network), and so on. The system 120 also configures the environment with stream-level access to the output stream, such as by creating files for the output data on a file system, enabling execution of tasks to create such files, pipelining handles (e.g., stdout) of the environment to a location on another VM instance that is co-located with the environment or hypervisor of the environment, and so forth.
At block 808, the task is performed in the environment. Execution of a task may include executing code of the task and passing a pass handle or handles of the input and output streams to the execution. For example, system 120 may pass a handle of input data stored on a file system to execution as an "stdin" variable. The system may also pass a handle of the output data stream to execution, for example, as a "stdout" variable. In addition, the system 120 may pass other information (such as metadata of the request or objects or collections specified within the request) to the execution as parameters. Thus, the code of the task is executable to stream the input data according to the function of the code and to write the output of the execution to the output stream using OS-level stream operations.
The routine 800 then proceeds to block 810, where the system 120 returns the data written to the output stream as output data for the task (e.g., back to the front end 162 of the object storage system). In one embodiment, block 810 may occur after completion of task execution, and thus, system 120 may return the written data as complete output data for the task. In other cases, block 810 may occur during task execution. For example, the system 120 may detect new data written to the output stream and return the data immediately without waiting for execution of a task. Illustratively, in the case of writing an output stream to an output file, the system 120 may delete the data of the output file after writing, such that immediately sending new data eliminates the need to maintain sufficient storage for the file system to store all of the output data for task execution. Still further, in some implementations, block 810 may occur when an end of an output stream handle describing the output stream is detected.
Additionally, at block 812, after execution is complete, the system 120 returns a return value provided by the execution (e.g., back to the front end 162 of the object storage system). The return value may specify the outcome of the execution, such as success or failure. In some cases, the return value may specify a next action to take, such as implementing additional data manipulation. Further, the return value may specify data to be provided to a calling device requesting an I/O operation on the data object, such as HTTP code to be returned. As discussed above, front end 162 may obtain such return values and take appropriate actions, such as returning error or HTTP code to the calling device, implementing additional data manipulation, performing I/O operations on the output data, and so forth. In some cases, the return value may be specified explicitly in the code of the task. In other cases, such as where a return value is not specified in the code, a default return value (e.g., a "1" indicating success) may be returned. Routine 800 then ends at block 814.
Clients typically desire the ability to determine process data (such as determining a checksum value for a file, or performing some other function) once the process data has been uploaded to an object storage service in order to confirm the integrity of the uploaded data. However, current technology generally requires waiting until a complete file is uploaded, even when the file is split into separate parts and the parts are uploaded in parallel (e.g., using a multipart upload program, which is a term used to refer to any program that individually uploads multiple parts or sub-objects and later combines them into a complete, reassembled, or sometimes referred to as a unified file or object), after which it may be determined to process the reassembled (or sometimes referred to as a unified) complete file. In the case of supporting multipart uploading, embodiments enable insertion of processing functions into the input/output path of each part, such that a separate intermediate or initial (or first) function can be performed on each part. In addition, embodiments also enable insertion of a processing function that combines separate intermediate or initial function outputs (e.g., checksum values for each portion of an input file, etc.) to determine a final (or second) function output associated with a reassembled input file (e.g., such as determining checksum values for the reassembled file, or determining some other function output based on the reassembled file). In the case of a parallel upload implemented by a multipart upload, the intermediate function outputs may also be calculated in parallel. Pre-computing the intermediate function output (such as a checksum) in parallel or iteratively during the upload portion enables the function output (e.g., checksum) of the complete file to be computed much faster after the upload is completed, as compared to computing the function output of the complete, reassembled file only after the upload and reassembly of the input file are completed. The term "reassembled" may also be referred to as "unification". For example, the reassembled file, object, or data may also be referred to as a unified file, object, or data.
Multipart uploading enables a client to split a file into separate parts and upload the separate parts in parallel. Once all the parts have been successfully uploaded, the client may submit a call to merge or reassemble the individual parts to form the original file. The client may also submit a manifest with calls indicating which portions are to be merged and the order in which the portions are to be merged.
One particularly useful application of this process is to determine the checksum of a large file based on the individual checksum values of the file portions, each of which may be uploaded in parallel. The checksum value is an error detection code determined from a set of data and is used to detect a change in the set of data. One such checksum value is determined using a cyclic redundancy check (e.g., CRC-32, which is a 32-bit cyclic redundancy check). The checksum algorithm enables calculation of a value or checksum of an object that is smaller than the object, but that will also change significantly if the object changes slightly. Thus, the checksum may be used to detect errors associated with the transfer of an object from one location to another. The routine shown in FIG. 9 may be used to calculate a checksum (or other value) of an input file based on individual checksums (or other values) determined from individual portions of the input file.
FIG. 9 is a flow diagram of an illustrative routine 900 that may be performed by an object storage service (such as object storage service 160), a code execution service (or a function running within a code execution service), such as on-demand code execution system 120, or both. Routine 900 may be used to dynamically process input data portions (sometimes referred to as chunks, fractions, or data sub-objects) of input data at runtime ("on-the-fly"). Such processing may occur when an incoming data portion is uploaded to and stored as a data object portion in an object storage service (such as, for example, object storage service 160) and in response to a request to merge the data object portion into a data object stored on the object storage service. Although routine 900 is described with respect to calculating checksum values for an input file based on individual checksum values for some independently uploaded portions of the input file, the routine may be used to determine a function output based on any initial or intermediate function output.
In some embodiments, routine 900 may be used to automatically determine the checksum value (or perform a first function on) of each individual input data portion as it is uploaded and prior to re-assembling the individual data object portions into a data object representing the complete input data. Determining the checksum value for each individual input data portion as it is uploaded and prior to reassembling the input data may advantageously reduce the amount of time before stored input data is ready for further processing or retrieval. For example, if an error occurs during an upload of an input data portion, the error may be detected upon completion of the upload of the input data portion, rather than after reassembling the complete input data. Such error detection may result in only re-uploading the portion of the input data having such errors. Alternatively, instead of determining the first condition after reassembling the complete input data, the first value determined from each input data portion as it is uploaded may be used to detect the first condition. Additional processing may also be performed on each input number portion based on the corresponding first value of each input data portion. In addition, the checksum value of the complete input data may be determined from the checksum value of each of the individual input data portions, rather than from the reassembled input data (e.g., after reassembling the portions into a data object). Similarly, a second or final value associated with the complete input data may be determined from the first value of the separate data portion by applying a second function to the first value, rather than from the reassembled input data. Determining the checksum (or second value) of the input data from the checksum (or first value) of the input data portion advantageously reduces latency and computational resource requirements. Aspects of routine 900 will be described with additional reference to FIG. 10, which is a system diagram of illustrative data flow and interactions between various components of service provider system 110.
Routine 900 may begin in response to an event, such as submitting a request from client device 102 to upload input data to object store service 160. Illustratively, the owner of the set of data objects to which input data is added as new data objects may have previously specified that, when uploading objects to the set using multi-part uploading, a first task should be performed to process each part of the uploaded data object and a second task should be performed on request to reassemble the parts into the data object. In some implementations, routine 900, or portions thereof, may be implemented serially or in parallel on multiple processors.
At block 902, the object store service 160 may receive a request to store input data submitted via a multipart upload. Fig. 10 shows at (1) that the object store service 160 receives the request. The request illustratively includes parameters such as an identifier of the input data to be stored as a data object by the object storage service 160, a location where the data object is stored, context data regarding the request, other data, or some combination thereof. For example, the request may be a resource request (such as a PUT request) for particular input data to be stored in the object data store 166 of the object store service 160, which input data is to be provided via multipart upload.
At block 902, the object store service 160 may also determine that a portion of the input data to be stored in the object store service 160 is to be used to generate a function output. In some embodiments, the determination may be based on the context data and/or the input data itself. For example, the object store service 160 may receive an indication that a client is to use a multipart file transfer protocol to transfer input data to the store service 160, or may require the use of a multipart file transfer protocol to upload input data to the object store service 160. In this case, the object store service 160 will determine the object identifier (e.g., object ID) of the multipart input data to be transmitted. The object storage service 160 provides the object ID to the client. In some embodiments, the input data is not transmitted using a multipart file transfer protocol. Instead, the input data is transmitted in portions (e.g., objects, sub-objects, files, depicted elements, etc.), but not necessarily according to a multipart file transfer protocol. A manifest or list may be provided to identify the portions that will be subsequently joined together and the order in which they will be joined together to reassemble the complete input data from the portions of the input data.
At block 904, the object store service 160 may receive a portion of the input data from the client. In one particular non-limiting embodiment, the input data may be a file, a compound file (e.g., a compressed file, such as a file compressed according to a. Zip,. Tar, or other compressed file format), a composable object, or a super composable object consisting of individual objects or sub-objects. Each input data portion is received with associated metadata that may include an object ID and an indication of one or more functions to be performed on the input data portion, the complete input data, or both. For example, the metadata may include checksum values ("received CVs") associated with the data object portions. The received input data portion, object ID, and metadata (e.g., received CV) may be stored in one or more buffers by the object storage service 160. The scratch pad is a data storage location and includes a data storage device accessible via a block storage service, a local disk, an object data store 166 of the object storage service 160, or other data storage location. The received input data portion, object ID, and metadata may be stored in the same or different buffers. Additionally, during the at least partially overlapping time periods, multiple input data portions may be received in parallel by object storage system 160. Further, the input data portions may be received in a different order than the order in which the input data portions were assembled into the complete data object. Thus, the metadata may include an input data portion identifier (input data portion ID) that may be used to specify the input data portion to be used and an order in which the input data portions are to be arranged to assemble the complete input data. Further, the input data portions may be the same size or have different sizes from each other.
The indication of one or more functions to be performed on the input data portion, the complete input data, or both may include an indication to manipulate and/or validate the input data portion, the input data, or both, prior to storing the input data within the object data store 166 of the object storage service 160. For example, the indication may indicate that the input data portion, the entire input data, or both, will be compressed, decompressed, encrypted, decrypted, or a combination thereof, before being stored within the object data store 166 of the object storage service 160. Additionally, the indication may indicate that the input data portion, the complete input data, or both, will be error checked prior to subsequent manipulation. For example, the input data portion may be error checked alone or checksum checked before being reassembled into the complete input data. Additionally, the reassembled input data may be checksum checked prior to being stored in the object data store 166. In some embodiments, the object store service 160 may automatically perform error checking on each input data portion and/or complete input data without receiving an indication that instructs the object store service 160 to do so. Upon receipt of each input data portion entirely, the object store service 160 can initiate error detection of each input data portion without waiting for the complete input data to be reassembled. Fig. 10 shows at (2) a portion of the object store service 160 that receives and stores input data.
At block 906, the object store service 160 may make a call to the execution environment 502 to execute a function (e.g., a first function) to determine a checksum value of the input data portion (or perform a different calculation or determination using the input data portion). FIG. 10 shows at (3) that the object store service 160 makes a call to the execution environment 502 and the execution environment 502 (or a function running within the execution environment 502) returns the result. In response to the call, VM instance 150 or other execution environment 502 may execute the function using the input data portion. For example, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may determine a checksum value ("determined CV") associated with the input data portion. The execution environment 502 (or a function running within the execution environment 502) may perform any of a variety of error detection operations on the input data portion, including cyclic redundancy check (e.g., CRC-32) or any other parallelizable error detection operation. The parallelizable error detection operation is an error detection operation that may be performed on portions of the input data, and the individual outputs of the error detection operations may be combined or otherwise used to determine a checksum or other data integrity indication associated with the complete input data. Each determined CV may be stored in any of a number of ways, including storing it as metadata with the input data portion, with a relational or non-relational database service, using a relational or non-relational database management system, or with an object storage service.
At block 908, the object store service 160 may process the output of the function received from the execution environment 502 (or a function running within the execution environment 502). For example, the object store service can use output data (such as a determined CV) received from the execution environment 502 (or a function running within the execution environment 502) to perform error detection on (or perform some other calculation or determination using) the input data portion. Error detection may include comparing the determined CV to a stored received CV. If the two values are different, the object store service 160 can determine that an error occurred during the upload of the input data portion and can request the client to retransmit the associated input data portion. Fig. 10 shows, at (4), the output of the object store service 160 processing function.
In some implementations, the object store service 160 can provide the determined CV (or first value) associated with the input data portion to the client. The client may receive the determined CV and compare it to the client-determined checksum value of the incoming data portion (or otherwise process the first value). If the two values are different, the client may determine that the associated input data portion needs to be resent to the object store service 160. In this case, the client will indicate to the object store service 160 that the incoming data portion is being retransmitted.
In some embodiments, instead of performing a checksum determination function on each input data portion received by the object store service 160, the execution environment 502 (or a function running within the execution environment 502) is configured to perform the function on a fixed-size portion of the input data (or input data portion). The size of the fixed-size portion may be configurable by the client. For example, the size may be specified using parameters that are sent to the object store service 160 in connection with the initiation of the incoming data multipart upload process. In some embodiments, the size is predetermined by the object store service 160 or the execution environment 502 (or a function running within the execution environment 502).
For example, a client may wish to upload a 10GB file as input data using a multipart upload process. The client may transmit the incoming data over multiple portions, each portion having the same or a different size. For example, a client may transmit in data on ten 1GB data object portions. The execution environment 502 may process each portion as it is received (as discussed above), or alternatively, the execution environment may process fixed-size portions of each portion. For example, the execution environment 502 may process each 100MB (or other predetermined fixed size) of each 1GB data object as it is received.
Such fixed-size portion processing may advantageously enable execution environment 502 to operate on known fixed-size inputs. This configuration greatly simplifies the provision of the scratch pad storage device for each fixed-size portion of the processed input data portion and improves the efficiency of the provision. In some embodiments, fixed size portion processing is automatically used if the full input data size exceeds a threshold, or if the input data portion size exceeds a threshold.
Blocks 904 through 908 define a parallelizable block 909 that may be iterated multiple times in parallel or sequentially, or both. For example, the blocks of block 909 may be performed for each input data portion received from the client and in parallel (e.g., during at least partially overlapping time periods).
At block 910, object store service 160 (or VM instance 150, other execution environment 502, or a function running within execution environment 502) may receive a request to execute a second function based on at least a portion of the first output. For example, the object store service 160 may receive a request to determine a checksum of reassembled input data from stored input data portions submitted via a multipart upload, or a request to reassemble input data from stored input data portions. Fig. 10 shows at (5) that the object store service 160 receives the request. The request illustratively includes parameters such as an identifier of the portion of the input data to be reassembled and stored as a data object by the object store service 160, a location where the data object is stored, context data regarding the request, other data, or some combination thereof. For example, the request may be a resource request, such as a PUT request.
At block 910, the object store service 160 may also determine that the portion of the input data stored in the object store service 160 is to be used to generate the function output. In some embodiments, the determination may be based on the context data and/or the input data itself. For example, the object store service 160 may receive an indication to combine previously received portions of input data together. A manifest or list may be provided to identify the portions to be joined together and the order in which they are to be joined together to reassemble the complete input data from the previously uploaded portions.
At block 912, the object store service 160 may execute the call to the execution environment 502 (or the function running within the execution environment 502) to determine the checksum of the reassembled input data (or to execute the second function using the separate checksums for each of the input data portion checksums) by using the separate checksums (or first values) for each of the input data portion checksums. FIG. 10 illustrates at (6) that the object store service 160 executes a call to the execution environment 502 (or a function running within the execution environment 502) to determine a checksum of the reassembled input data. In one embodiment, the object store service 160 receives a manifest from a client identifying the input data portions to be reassembled into the complete input data. In addition, the manifest identifies the order in which the input data portions are reassembled into the complete input data. For example, individual input data portions may not be received in order, and the manifest may be used to determine the correct ordering of the input data portions within the complete input data. The checksum value for each of the portions of input data identified in the manifest is provided to the execution environment 502 with a call to determine a checksum of the reassembled input data. The execution environment 502 may perform a function to determine the checksum of the reassembled input data by combining the separate checksums or by determining the checksum of the separate checksum values. The execution environment 502 (or a function running within the execution environment 502) may return the checksum of the reassembled input data to the object store service, as shown at (6) in fig. 10.
At decision block 914, object store service 160 processes the output of the function. For example, the object store service 160 may use the checksum of the reassembled input data to perform error detection, or it may provide output to the client to enable the client to perform error detection. In some embodiments, the object store service 160 may process the output of the function by storing the output as an object within the object data store 166. If an error is detected, the client may retransmit one or more portions of the incoming data. If no errors are detected, the object store service 160 reassembles the complete input data from the stored input data portions based on the contents of the manifest. Fig. 10 shows at (7) the output of the object store service processing function and the reassembly of the complete input data from the stored input data portions.
At block 916, the object store service 160 may store the reassembled input data as a data object in the object data store 166. FIG. 10 shows at (8) that the object store service stores the reassembled input data as a data object.
Blocks 912 through 916 are shown as occurring in sequence. However, the order in which these blocks occur may be different. In some embodiments, the ordering may be different, or two or more blocks may be performed simultaneously, or during periods of at least partial overlap. For example, in some embodiments, block 912 may be performed concurrently (or partially concurrently) with block 914 and/or block 916. In some embodiments, blocks 914 and 916 may also occur prior to block 912.
The routine may terminate at block 918.
In some embodiments, a client sends a request to an object store service (such as object store service 160) to write input data or files as data objects to a storage location, such as an object data store (including object data store 166). For example, a client may wish to store a collection of client records that include personal client information (e.g., a government issued identity number, social security number, etc. of the client). The client may wish to obscure the client record prior to storage so that the user can only retrieve versions of the client record in which the personal client information has been obscured. The client may wish to allow only a few users with advanced security credentials to access the unblurred personal client information. In another example, the input data may include a medical image (e.g., photograph, x-ray photograph, wave image, ultrasound image, etc.), wherein a portion of the image includes personally identifiable information, such as a patient's name. The client may wish to obscure personally identifiable information from the medical image. The client request may include the input data, or information that may be used by the object store service 160 to obtain the input data. In response to the request, the object store service 160 may store the input data in a scratch pad, such as any of the scratch pads discussed above. Once the input data has been staged, a routine for blurring the input data, such as routine 1100 of fig. 11, may be initiated.
FIG. 11 is a flow diagram of an illustrative routine 1100 that may be performed by a code execution service (or a function running within the execution environment 502), such as the on-demand code execution system 120, to dynamically tokenize, mask, scramble, blur, encrypt, or otherwise present an unintelligible (for convenience, collectively referred to herein as "blur") portion of input data at runtime in response to a request to store or write the input data. Blurring also includes replacing (e.g., selectively replacing) one or more portions of the input data with different unique data, such as tokens. The token for each instance of the data that is replaced (e.g., each instance of private information) is different from each other token. In other words, a one-to-one mapping of tokens to each instance of private information may be provided. The routine may be implemented as a function of the on-demand code execution system 120, and a user may append or insert the function into the input-output path of a given set of objects. The request to write input data includes a request to write or store the input data as a data object in a storage location, such as an object data store, including object data store 166. The client may wish to store a data set that includes private information and non-private information. However, the client may wish to store the data set in a manner that separates private information from non-private information and stores the private information and the mapping between the token and the private information in a secure location for access by only a limited number of authorized individuals or resources. The client may also wish to provide access to non-private information by a larger individual or group of resources, or to store non-private information in a less secure location. Aspects of routine 1100 will be described with reference to FIG. 12, which is a system diagram of illustrative data flow and interactions between various components of service provider system 110.
Routine 1100 may begin in response to an event, such as when the routine shown in fig. 8 reaches block 808. For example, routine 1100 may be an owner-defined function or the like (also referred to as a user-defined task) that is executed by VM instance 150 or other execution environment 502 generated during the routine shown in FIG. 8. In some implementations, routine 1100, or portions thereof, may be implemented serially or in parallel on multiple processors.
At block 1102, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may receive parameters associated with a request to write input data. FIG. 12 illustrates at (1) that execution environment 502 (or a function running within execution environment 502) receives parameters associated with a request. In some embodiments, the parameters may include reference data including a reference to input data to be stored as a data object, a reference to an output location of the data object, context data regarding the request, other data or metadata, or some combination thereof. For example, the request may be a resource request, such as a PUT request, to store input data as a particular data object in object store service 160. The reference to the input data may be data that the execution environment 502 (or a function running within the execution environment 502) may use to access the input data, such as a file descriptor, a file handle, a pointer, or some other data representing an address or identifier of the input data. The reference to the output location may be data that the execution environment 502 (or a function running within the execution environment 502) may use to write, store, or otherwise persist output data, such as a file descriptor, a file handle, a pointer, or some other data representing an address or identifier of the location used to provide the output of the function. The context data or metadata may include data or metadata about the context of the request, such as an identifier of the user, account, or other source of the request, an identifier of the access or security profile from which the request was made, data representing access or security rights to which the request will be processed, an identifier of a location associated with the request, an identifier of a language associated with the request, or data representing preferences or tendencies of the source of the request. While FIG. 12 depicts object store service provision parameters such as references to requested data objects or references to output locations of the execution environment 502 (or functions running within the execution environment 502), in other cases these references may be provided by elements of the execution system 120 (such as the scratch code 157).
At block 1104, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may use the reference data to obtain input data to be stored as a data object. The input data may be obtained in an unobscured or substantially unobscured form. FIG. 12 shows at (2) that the execution environment 502 (or a function running within the execution environment 502) obtains input data. In some embodiments, at block 1104, the input data may not be obtained from the object store service 160, but may be provided to the execution environment 502 (or a function running within the execution environment 502) in advance. For example, during staging of the execution environment, input data may be obtained and stored on the computing device of the execution environment 502 at a location indicated by the reference data.
At block 1106, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may determine that one or more portions of the input data are to be obscured. In some embodiments, the determination may be based on context data and/or input data. For example, if a portion of the input data appears or is determined to be, or is likely to be, in the form of private or personally identifiable information, the execution environment 502 (or a function running within the execution environment 502) may determine that such portion is to be obscured. The execution environment 502 (or a function running within the execution environment 502) may test one or more items of context data against one or more criteria to determine whether to perform obfuscation and which portions of the input data to obfuscate. If the context data item meets one or more criteria, the execution environment 502 (or a function running within the execution environment 502) may determine that one or more portions of the input data are to be obscured such that the obscured portions render portions of the data as being unintelligible to the recipient. FIG. 12 shows at (3) the portion of the execution environment 502 (or a function running within the execution environment 502) determining fuzzy input data.
The test context data against the standard may include determining that the input data includes private or personally identifiable information including, but not limited to, the name, address, age of the individual, government issued identity number, social security number, date of birth, place of birth, mother's maiden name, biometric information, health information, vehicle Identification Number (VIN), etc., or that the input data includes information that has been designated as confidential.
In one particular non-limiting embodiment, the input data may be a data file, such as a spreadsheet, a bounded file, or other collection of data records. If the request meets one or more criteria, portions of the data file (such as a collection of records, a collection of columns or data fields, etc.) will be stored in a obscured form. The execution environment 502 (or a function running within the execution environment 502) may determine that the attributes of the request indicated by the context data or otherwise associated with the request meet the criteria for a particular record, column, and/or field of the requested data object. The execution environment 502 (or a function running within the execution environment 502) may determine based on the criteria that a particular record, column, and/or field of the requested input data is obscured before being output by the function (e.g., to be stored as a data object).
At block 1108, VM instance 150 or other execution environment 502 may optionally apply blurring to the portion of the input data determined above. FIG. 12 shows at (4) a portion of the execution environment 502 (or a function running within the execution environment 502) blurring input data. Blurring the content of a portion of the input data may involve using one or more blurring methods, such as scrambling the content in a pseudo-random manner, generating a hash of the content, replacing the content with a token mapped to the content in a data store (such as object storage service 160), encrypting the portion, and so forth. In some embodiments, encryption is performed using a key under control of the data object owner and the encryption is managed using a key management service. In some embodiments, different blurring methods may be used for different portions of a data object, different data objects, different contextual data criteria, and so forth.
For example, in one embodiment, the obfuscation method may include replacing a portion of the input data with a token mapped to a key-value pair that is secured in a secure location (such as an external database). For example, the social security number "909-09-0909" may be replaced with a globally unique identifier (such as "001"), and a different database may store key-value pairs that map the key "001" to "909-09-0909".
In some embodiments, the obfuscation method may be specified by an entity that owns or is responsible for the data object that is requested to be stored (e.g., as part of a request to store the input data as the data object). For example, one entity may specify that a particular type of ambiguity (e.g., industry standard ambiguity methods in the medical arts) be used for a data object or data object bucket, while another entity may specify that a different type of ambiguity (e.g., tokenization using a token-to-data mapping) be used for a different data object or data object bucket. If no obfuscation method is specified, the execution environment 502 (or a function running within the execution environment 502) may apply a default obfuscation method.
At block 1110, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may provide selectively obscured input data as an output of the function. For example, the execution environment 502 (or a function running within the execution environment 502) may place selectively obscured input data at an output location indicated by the reference data and ultimately determine an output. Finalizing the output of the function may include providing a return value (e.g., indicating success, failure, or some other characteristic of function execution) to the object store service 160 and/or an output stream or end of file to be identified by a reference to the output location. Additionally, at block 1110, VM instance 150 or other execution environment 502 or functions running within execution environment 502 may also provide an index as second output data. The index may include a mapping between the token and the obscured private information. The index may then be stored using an object store service, a different object store service, or a different store service (such as a database store service or any other store service). FIG. 12 shows at (5) that execution environment 502 (or a function running within execution environment 502) provides selectively obscured input data as output. Routine 1100 may terminate at block 1112.
Blurring of the data object portion at the time of writing provides certain data management advantages. For example, if the input data includes a customer record, such as purchase history, personally identifiable information, and other private and non-private information, then in the event that a particular customer deletes her account, the data object including the obscured version of the information may be more easily updated. For example, rather than having to scan the entire data object to locate and remove private information for all deleted clients, the system may instead delete the token's map (or token to key-value pair map, as discussed above) associated with the deleted clients from the token map or delete private information for clients from the location where such private information is stored.
FIG. 13 is a flow diagram of an illustrative routine 1300 that may be performed by a code execution service (such as the on-demand code execution system 120) to dynamically determine and store an index of content of input data at runtime in response to a request to store the input data as a data object. The client may wish to retrieve only a portion of the compound file stored as a data object in the object store service. By providing an index that identifies different files or data sets or items within the compound file and their locations, the object store service is able to retrieve and provide only the desired portion to the client. Aspects of routine 1300 will be described with reference to fig. 14, which is a system diagram of illustrative data flow and interactions between various components of service provider system 110.
In some embodiments, a client sends a request to an object store service (such as object store service 160) to write input data as a data object to a storage location, such as an object data store (including object data store 166). For example, a client may wish to store input data that includes a compound file (such as a compressed file, sometimes referred to as a zip archive, tar archive, or compressed file, or other file made up of a collection of individual data elements). The compound file may include one or more individual files, each of which is compressed. The compound file may also include an index of the content of the compound file. The index may include the name of each of the individual files within the compound file, as well as other metadata regarding the content of the compound file. The index may also provide a mapping between the content of the compound file and byte range locations of each of the content. Thus, the index enables a user to use a "byte range GET" to request only bytes of a certain desired file or other content of a compound file. In other examples, the compound file does not include an index of the content of the compound file. In yet another example, the input data is not a compound file, but the object store service 160 is configured to generate storable data objects corresponding to compressed versions of the input data and store the compressed versions within the object store service. The client request may include the input data, or information that may be used by the object store service 160 to obtain the input data. In response to the request, the object store service 160 may store the input data in a scratch pad, such as any of the scratch pads discussed above. Once the object has been staged, a routine for indexing input data, such as routine 1100 of FIG. 13, may be initiated.
Routine 1300 may begin in response to an event, such as when the routine shown in fig. 8 reaches block 808. For example, routine 1300 may be an owner-defined function or the like (also referred to as a user-defined task) that is executed by VM instance 150 or other execution environment 502 generated during the routine shown in FIG. 8. In some implementations, the routine 1300, or portions thereof, may be implemented serially or in parallel on multiple processors.
At block 1302, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may receive parameters associated with a request to store input data as a data object. FIG. 14 illustrates at (1) that execution environment 502 (or a function running within execution environment 502) receives parameters associated with a request. In some embodiments, the parameters may include reference data including reference input data to be stored as a data object, references to output locations of the data object, context data regarding the request, other data or metadata, or some combination thereof. For example, the request may be a resource request, such as a PUT request, to store input data as a particular data object in object store service 160. The reference to the input data may be data that the execution environment 502 (or a function running within the execution environment 502) may use to access the input data, such as a file descriptor, a file handle, a pointer, or some other data representing an address or identifier of the input data. The reference to the output location may be data that the execution environment 502 (or a function running within the execution environment 502) may use to write, store, or otherwise persist output data, such as a file descriptor, a file handle, a pointer, or some other data representing an address or identifier of the location used to provide the output of the function. The context data or metadata may include data or metadata about the context of the request, such as an identifier of the user, account, or other source of the request, an identifier of the access or security profile from which the request was made, data representing access or security rights to which the request will be processed, an identifier of a location associated with the request, an identifier of a language associated with the request, or data representing preferences or tendencies of the source of the request. While FIG. 14 depicts object store service provision parameters such as references to requested data objects or references to output locations of the execution environment 502 (or functions running within the execution environment 502), in other cases these references may be provided by elements of the execution system 120 (such as the scratch code 157).
At block 1304, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may use the reference data to obtain input data. Fig. 14 shows at (2) that the execution environment 502 obtains input data. In some embodiments, at block 1104, the input data may not be obtained from the object store service 160, but may be provided to the execution environment 502 (or a function running within the execution environment 502) in advance. For example, during staging of the execution environment, input data may be obtained and stored on the computing device of the execution environment 502 at a location indicated by the reference data.
At block 1306, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) generates an index of the content of the input data. In some embodiments, the index is generated by obtaining the names of individual files stored within the input data. For example, the input data may include an index of the content of the data object. If not, the execution environment 502 (or a function running within the execution environment 502) may read and store the name of each file within the input data. In some implementations, files within the input data are extracted or decompressed, so that file names and/or file content may be determined. In some embodiments, the execution environment 502 (or a function running within the execution environment 502) uses metadata or a header stored within the input data to generate an index of the content of the input data. In some embodiments, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) is configured to unpack or recursively unpack input data to determine the content of the input data (e.g., an identifier of a depicted element within the input data, and a byte-range location of the depicted element within the input data, the depicted element being a file or any other depicted element described herein). Recursive unpacking may include analyzing a second compound file located within the first file. VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may unwrap a first file to identify a second file (or multiple second files) and then unwrap the second file to determine an identifier and byte range (or other) location of the depicted element within the second file. In some embodiments, the execution environment 502 (or a function running within the execution environment 502) generates an index of the content of the input data by analyzing text within the input data. The index includes content identifiers (e.g., file names, text fields, header information, metadata, etc.) and location information associated with each identifier. For example, the index may include a list of all files within the input data, as well as the location (e.g., byte range, etc.) of each file within the input data. In another example, the index may include a list of all of the headers of the data sets within the input data (e.g., sales data for various geographic areas), as well as the location (e.g., byte range, etc.) of each data set within the input data. In addition, when the input file includes a compound file, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may determine a file aggregation technique for forming the compound file. For example, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may determine whether a compound file is. Zip,. Tar, or other format by analyzing bytes within the file. For example, some aggregation techniques generate files having a known header format. Accordingly, VM instance 150 or other execution environment 502 (or functions running within execution environment 502) may dynamically evaluate input data based on bytes (sometimes referred to as file aggregate technical information) and use this information to determine how to further read and interpret the remainder of the input data. For example, file aggregation technique information may be used to determine whether to perform recursive unpacking of files, such as discussed above. Fig. 14 shows at (3) that the execution environment 502 (or a function running within the execution environment 502) determines an index of the input data content.
At block 1308, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may provide an index as an output of the function. For example, the execution environment 502 (or a function running within the execution environment 502) may return the index to the object store service 160. In some embodiments, VM instance 150 or other execution environment 502 (or a function running within execution environment 502) may return input data instead of, in addition to, or in conjunction with the index (as second output data). FIG. 14 shows at (4) that execution environment 502 (or a function running within execution environment 502) provides an index as an output.
At block 1310, the object store service 160 may process the function output. For example, the object store service 160 (or a different service) may store an index. Fig. 14 shows at (5) that the object store service 160 stores the index. The index may be stored at any of a variety of locations. For example, the object store service 160 may store the index in an object data store (such as object data store 166 of the object store service 160). In another example, object store service 160 may store the index as a table using a relational or non-relational data store service or database management system. In yet another example, an index may be appended or otherwise added to the input data, and updated input data (with the index) may be stored by the object store service 160.
In some embodiments, at block 1310, the object store service 160 may create a data object corresponding to the input data and add metadata to the data object that includes a reference to the index. The reference may include an indication that there is an index associated with the data object. In another embodiment, data objects corresponding to input data and indexes may be associated with each other via a naming convention. For example, the data objects and indexes may have similar identifiers or name parts, such as prefixes, suffixes, or other identifiers. The reference may be used by a subsequent user of the data object to obtain a desired portion of the data object. For example, a user may retrieve an index and select a desired portion of a data object. The object store service 160 and execution environment 502 (or functions running within the execution environment 502) can use the desired portion indicated by the user and the index to identify the location (e.g., byte range) of the desired portion of the data object within the data object. The object store service 160 and execution environment 502 (or functions running within the execution environment 502) may use the location to retrieve (e.g., extract or decompress) a desired portion of the data object (e.g., via performing a byte-range query or GET, etc., on the stored data object) and provide it to the user.
The routine may terminate at block 1312.
FIG. 14 illustrates an execution environment 502 (or a function running within the execution environment 502) indexing data objects for storage in an object storage service 160 in response to receiving a request to store data objects. Although the timely transformation is illustrated as occurring in conjunction with the operations of routine 1300 for indexing data objects, the timely transformation may be performed in conjunction with any other routine described herein, in conjunction with any other owner-defined function or owner-defined task, in a pipeline having multiple functions, etc.
Example embodiments
Examples of embodiments of the present disclosure may be described in terms of:
clause 1. A system, the system comprising:
An object storage service comprising one or more computing devices, wherein the object storage service is configured to store a plurality of data objects within an object data store, and
A code execution service comprising one or more computing devices for executing functions on demand in an input/output (I/O) path of the object store service;
Wherein the object store service is configured to at least:
Receiving input data and a request to store the input data as a data object within the object data store;
Determining that a function for blurring a portion of the input data associated with the request to store the input data is to be performed prior to storing the input data as the data object, and
Transmitting a call to the code execution service to execute the function on the input data, and
Wherein the code execution service is configured to at least:
receiving said call from said object store service to execute said function, said call including said input data, and
Executing the function, wherein executing the function causes the service to:
Identifying within the input data one or more instances of private information to be obscured prior to storing the input data as the data object within the object data store;
Generating output data comprising the one or more instances of the private information in obscured form of the input data and a remaining portion of the input data in unobscured form, and
Returning the output data to the object store service;
Wherein the object store service is further configured to store the output data as the data object in the object data store.
Clause 2 the system of clause 1, wherein the input data does not remain stored in the object storage service in an unobscured form after the object storage service stores the output data.
Clause 3 the system of clause 1, wherein the code execution service is configured to generate the output data by:
determining a unique token for each instance of the private information, wherein each unique token is different from each other unique token;
storing a mapping of the instance of the private information and the unique token to the instance of the private information, and
Each instance of the private information is replaced with a corresponding unique token.
Clause 4 the system of clause 1, wherein the code execution service is configured to generate the obscured form of the private information by encrypting the private information.
Clause 5. A computer-implemented method, the computer-implemented method comprising:
under control of a computing system comprising one or more computer processors configured to execute specific instructions,
Receiving a request to store input data as data objects in a data store;
Determining, based at least in part on the request, to perform a function to blur a portion of the input data prior to storing the input data as the data object;
Configuring a code execution system to execute the function, wherein the code execution system is configured to provide on-demand execution of the function in an input/output (I/O) path of the data store;
Executing the function using the code execution system prior to storing the input data as the data object, wherein executing the function comprises:
obtaining the input data;
determining to blur a first portion of the input data;
generating a first blurred portion including the first portion in blurred form, and
Generating output data comprising the first blurred portion, wherein the output data does not include the first portion in an unblurred form, and
The output data is stored as the data object in the data store.
Clause 6 the computer-implemented method of clause 5, wherein determining that the first portion of the input data is obscured comprises determining that the first portion comprises private information.
Clause 7. The computer-implemented method of clause 6, wherein the personal information represents one or more of personally identifiable information, name, address, age, government issued identity number, date of birth, place of birth, mother's maiden name, account number, or biometric record.
Clause 8 the computer-implemented method of clause 5, wherein generating the first blurred portion comprises:
Determining a unique token corresponding to the first portion;
storing a mapping of the token to the first portion, and
The first portion is replaced with the unique token.
Clause 9 the computer-implemented method of clause 8, wherein storing the mapping of the token to the first part comprises storing the mapping of the token to the first part in a storage location having different access permissions than a location in the data store where the data object is stored.
Clause 10, the computer-implemented method of clause 5, wherein generating the first blurred portion comprises encrypting the first portion of the input data using an encryption key.
Clause 11 the computer-implemented method of clause 10, further comprising storing the encryption key and a mapping of the encryption key to the first obfuscated portion.
Clause 12 the computer-implemented method of clause 5, wherein determining that the first portion of the input data is obscured is based at least in part on a portion of the input data.
Clause 13, a system, the system comprising:
a data store storing a plurality of data objects, and
One or more computing devices in communication with the data store and configured to at least:
receiving a request to store input data as data objects in a data store;
Determining, based at least in part on the request, to perform a function to blur a portion of the input data prior to storing the input data as the data object;
Configuring a code execution system to execute the function, wherein the code execution system is configured to provide on-demand execution of the function in an input/output (I/O) path of the data store;
Executing the function using the code execution system prior to storing the input data as the data object, wherein executing the function comprises:
obtaining the input data;
determining to blur a first portion of the input data;
generating a first blurred portion including the first portion in blurred form, and
Generating output data comprising the first blurred portion, wherein the output data does not include the first portion in an unblurred form, and
The output data is stored as the data object in the data store.
Clause 14 the system of clause 13, wherein the one or more computing devices are further configured to determine to obscure the first portion of the input data by determining that the first portion includes private information.
Clause 15 the system of clause 14, wherein the personal information represents one or more of personally identifiable information, name, address, age, government issued identity number, date of birth, place of birth, mother's wedding, account number, or biometric record.
Clause 16 the system of clause 13, wherein the one or more computing devices are further configured to generate the first blurred portion by:
Determining a unique token corresponding to the first portion;
storing a mapping of the token to the first portion, and
The first portion is replaced with the unique token.
Clause 17 the system of clause 16, wherein the one or more computing devices are further configured to generate the first fuzzy portion by storing the mapping of the token to the first portion in a storage location having different access permissions than a location in the data store where the data object is stored.
Clause 18 the system of clause 13, wherein the one or more computing devices are further configured to generate the first blurred portion by encrypting the first portion of the input data using an encryption key.
Clause 19 the system of clause 18, wherein the one or more computing devices are further configured to store the encryption key and a mapping of the encryption key to the first obfuscated portion.
The system of clause 20, wherein the one or more computing devices are further configured to determine to obscure the first portion of the input data based at least in part on the portion of the input data.
Further examples of embodiments of the present disclosure may be described in terms of:
clause 1. A system, the system comprising:
An object storage service comprising one or more computing devices, wherein the object storage service is configured to store a plurality of data objects within an object data store, and
A code execution service comprising one or more computing devices for executing functions on demand in an input/output (I/O) path of the object store service;
Wherein the object store service is configured to at least:
receiving from a client (1) input data as a plurality of input data portions, and (2) a request to store the input data portions as data object portions within the object data store;
Determining that a first function associated with the request to store the input data portion for generating a checksum value for the input data portion is to be performed for each input data portion, and
Transmitting a first call to the code execution service to execute the first function on the input data portion of each of the input data portions, and
Wherein the code execution service is configured to at least:
receiving the first call from the object store service to perform the first function on the input data portion, and
Executing the first function, wherein executing the first function causes the code to perform a service:
Generating separate checksum values for the input data portion, and
Returning the separate checksum value as first output data, and
Wherein the object store service is further configured to:
Storing the first output data of each of the input data portions as separate checksum data objects;
reassembling the input data using at least some of the input data portions;
Determining that a second function for generating checksum values for said reassembled input data is to be performed, and
Transmitting a second call to the code execution service to execute the second function on the separate checksum data object, and
Wherein the code execution service is configured to at least:
receiving the second call from the object store service to execute the second function, and
Executing the second function, wherein executing the second function causes the code to perform a service:
Generating checksum values for the reassembled input data based on the separate checksum data objects, and
Returning said checksum value of said reassembled input data as second output data, and
Wherein the object store service is further configured to:
storing the second output data as an input data checksum data object;
performing error detection using the second output data to determine whether the input data has been received without error;
reassembling the input data using the input data portion, and
The reassembled input data is stored as a data object within the object data store.
Clause 2 the system of clause 1, wherein executing the first function causes the code execution service to generate the separate checksum value for the input data portion by performing a cyclic redundancy check using the input data portion, and wherein executing the second function causes the code execution service to generate the checksum value for the reassembled input data by performing a cyclic redundancy check using the separate checksum data object.
Clause 3 the system of clause 1, wherein the object storage service is further configured to store the second output data as metadata to the data object.
Clause 4 the system of clause 1, wherein the object storage service is further configured to determine, based on the separate checksum data object, that one or more input data portions have been received in the event of an error, and to provide information to the client regarding whether one or more input data portions have been received in the event of an error.
Clause 5. A computer-implemented method, the computer-implemented method comprising:
under control of a computing system comprising one or more computer processors configured to execute specific instructions,
Receiving input data from a client as a plurality of input data portions via separate upload processes, and receiving a request from the client to store the input data portions as data object portions within a data store;
determining, based at least in part on the request, to perform a first function on each input data portion prior to acknowledging storage of the input data portion as the data object portion;
executing the first function on each of the input data portions using a code execution system, wherein the code execution system provides on-demand execution of a function specified in an input/output (I/O) path of the data store, wherein executing the first function comprises:
Obtaining the input data portion;
executing the first function to generate a first function value for the input data portion, and
Returning to the first function value;
storing the first function value;
Receiving a request to assemble at least some of the input data portions into reassembled input data;
Determining, based at least in part on the request, to perform a second function on the reassembled input data;
executing the second function using the code execution system, wherein executing the second function comprises:
obtaining individual first function values of at least some of the input data portions;
Performing the second function using the separate first function value to generate a second function value for the reassembled input data, and
Returning said second function value of said reassembled input data as second output data, and
And storing the second output data.
Clause 6 the computer-implemented method of clause 5, further comprising determining that a particular input data portion has been received in the presence of an error by using the first function value, and providing information to the client regarding whether one or more input data portions have been received in the presence of an error.
Clause 7 the computer-implemented method of clause 5, further comprising providing the first function value to the client to enable the client to perform error detection on the input data portion.
Clause 8 the computer-implemented method of clause 5, wherein performing the first function comprises generating separate checksum values by performing a cyclic redundancy check using the input data portion, and wherein performing the second function comprises generating checksum values for the reassembled input data by performing a cyclic redundancy check using at least some of the separate checksum values.
Clause 9 the computer-implemented method of clause 5, further comprising reassembling the input data using the input data portion and storing the reassembled input data as a data object in the data store.
Clause 10 the computer-implemented method of clause 9, wherein storing the second output data comprises storing the second output data as metadata to the data object.
Clause 11 the computer-implemented method of clause 9, wherein determining, based at least in part on the request, to perform the second function for the reassembled portion of input data includes determining, based at least in part on the request, to perform the second function on the reassembled portion of input data prior to reassembling the data input.
The computer-implemented method of clause 12, wherein each input data portion comprises a plurality of input data sub-portions, each input data sub-portion having a fixed size, and wherein performing the first function on each of the input data portions comprises performing the first function on each input data sub-portion of each input data portion.
Clause 13, a system, the system comprising:
A data store configured to store a plurality of data objects, and
One or more computing devices in communication with the data store and configured to at least:
Receiving input data from a client as a plurality of input data portions via separate upload processes, and receiving a request from the client to store the input data portions as data object portions within a data store;
determining, based at least in part on the request, to perform a first function on each input data portion prior to storing the input data portion as the data object portion;
executing the first function on each of the input data portions using a code execution system, wherein the code execution system provides on-demand execution of a function specified in an input/output (I/O) path of the data store, wherein executing the first function comprises:
Obtaining the input data portion;
executing the first function to generate a first function value for the input data portion, and
Returning to the first function value;
storing the first function value;
Receiving a request to assemble at least some of the input data portions into reassembled input data;
determining to perform a second function for the reassembled input data based at least in part on the request;
executing the second function using the code execution system, wherein executing the second function comprises:
obtaining individual first function values of at least some of the input data portions;
Performing the second function using the separate first function value to generate a second function value for the reassembled input data, and
Returning said second function value of said reassembled input data as second output data, and
And storing the second output data.
Clause 14 the system of clause 13, wherein the code execution system is further configured to determine that a particular input data portion was received in the presence of an error by using the first function value and to provide the client with information regarding whether one or more input data portions have been received in the presence of an error.
Clause 15 the system of clause 13, wherein the code execution system is configured to provide the first function value to the client to enable the client to perform error detection on the input data portion.
Clause 16 the system of clause 13, wherein the code execution system is further configured to execute the first function to generate separate checksum values by performing a cyclic redundancy check using the input data portion, and to execute the second function to generate checksum values for the reassembled input data by performing a cyclic redundancy check using at least some of the separate checksum values.
Clause 17 the system of clause 13, wherein the one or more computing devices are further configured to reassemble the input data using the input data portion and store the reassembled input data as a data object in the data store.
Clause 18 the system of clause 17, wherein the one or more computing devices are further configured to store the second output data as metadata to the data object.
The system of clause 19, wherein the code execution system is further configured to determine to execute the second function on the reassembled portion of input data based at least in part on the request by determining to execute the second function on the reassembled portion of input data prior to reassembling the data input based at least in part on the request.
The system of clause 20, wherein each input data portion comprises a plurality of input data sub-portions, each input data sub-portion having a fixed size, and wherein the code execution service is configured to execute the first function on each of the input data portions by executing the first function on each input data sub-portion of each input data portion.
Further examples of embodiments of the present disclosure may be described in terms of:
clause 1. A system, the system comprising:
An object storage service comprising one or more computing devices, wherein the object storage service is configured to store a plurality of data objects, and
A code execution service comprising one or more computing devices for executing functions on demand in an input/output (I/O) path of the object store service;
Wherein the object store service is configured to at least:
Receiving input data and a request to store the input data as a data object within the object data store, the input data comprising a compound file, wherein the compound file comprises a plurality of individual files and a file identifier for each of the individual files and byte range location information identifying byte range locations of the individual files within the compound file;
Determining that a function associated with the request to store the input data is to be performed prior to storing the input data as the data object for creating an index by extracting the file identifier and the byte range location information from the input data, and
Transmitting a call to the code execution service to execute the function on the input data, and
Wherein the code execution service is configured to at least:
receiving said call from said object store service to execute said function, said call including said input data, and
Executing the function, wherein executing the function causes the code to perform a service:
Creating an index by extracting the file identifier and the byte range location information from the input data, the index mapping the file identifier to the corresponding byte range location information, and
Returning the index as output data, and
Wherein the object store service is further configured to store the input data as a first data object in the object data store, and wherein the output data is storable in an index data store associated with the first data object.
Clause 2 the system of clause 1, wherein the code execution service is further configured to determine file aggregation technique information associated with the compound file, and extract the file identifier and the byte range location information from the input data using the file aggregation technique information.
Clause 3 the system of clause 1, wherein the first data object is associated with the output data via a naming convention or by including metadata with the first data object referencing the second data object.
Clause 4 the system of clause 1, wherein the index enables the client to retrieve the desired portion of the compound file from the data store without having to retrieve the entire data object from the object data store.
Clause 5. A computer-implemented method, the computer-implemented method comprising:
under control of a computing system comprising one or more computer processors configured to execute specific instructions,
Receiving a request to store input data as a data object in a data store, the input data comprising a set of depicted elements;
Determining, based at least in part on the request, to perform a function to generate an index mapping an element identifier and an element position of each delineated element prior to storing the input data as the data object;
Configuring a code execution system to execute the function, wherein the code execution system provides on-demand execution of the function in an input/output (I/O) path of the data store;
Executing the function using the code execution system prior to storing the input data as the data object, wherein executing the function comprises:
obtaining the input data;
generating an index mapping the element identifier and the element position within the input data, and
Returning the index as output data, and
The output data is stored separately and in association with the data object.
Clause 6 the computer-implemented method of clause 5, further comprising decompressing the input data prior to generating the index.
Clause 7 the computer-implemented method of clause 5, wherein generating the index comprises extracting the element identifier and the element position from the input data.
Clause 8 the computer-implemented method of clause 5, further comprising generating the element identifier, the element location, or both using the depicted element.
Clause 9 the computer-implemented method of clause 5, wherein storing the output data separately from the data object comprises storing the output data as a second data object independently accessible from within the data store.
Clause 10 the computer-implemented method of clause 5, wherein storing the output data comprises using a data storage service to store the output data in a database.
Clause 11 the computer-implemented method of clause 5, wherein the depicted elements comprise one or more of rows, lines, files, comma separated values, or columns of data.
Clause 12 the computer-implemented method of clause 5, further comprising compressing the input data and storing the compressed input data as the data object.
Clause 13, a system, the system comprising:
A data store configured to store a plurality of data objects, and
One or more computing devices in communication with the data store and configured to at least:
Receiving a request to store input data in the data store as data objects, the input data comprising a set of depicted elements;
Determining, based at least in part on the request, to perform a function to generate an index mapping an element identifier and an element position of each delineated element prior to storing the input data as the data object;
Configuring a code execution service to execute the function, wherein the code execution service is thereby configured to:
obtaining the input data;
generating an index mapping the element identifier and the element position within the input data, and
Returning the index as output data, and
The output data is stored separately and in association with the data object.
Clause 14 the system of clause 13, wherein the code execution system is further configured to decompress the input data prior to generating the index.
Clause 15 the system of clause 13, wherein the code execution service is configured to generate the index by extracting the element identifier and the element location from the input data.
Clause 16 the system of clause 13, wherein the code execution service is further configured to generate the element identifier, the element location, or both, using the depicted element.
Clause 17 the system of clause 13, wherein the one or more computing devices are further configured to store the output data separately from the data object by storing the output data as a second data object independently accessible from within the data store.
Clause 18 the system of clause 13, wherein the one or more computing devices are further configured to store the output data by storing the output data in a database using a data storage service.
Clause 19 the system of clause 13, wherein the depicted elements comprise one or more of rows, lines, files, comma separated values, or columns of data.
Clause 20 the system of clause 13, wherein the one or more computing devices are further configured to compress the input data and store the compressed input data as the data object.
All of the methods and processes described above may be embodied in and fully automated via software code modules executed by one or more computers or processors. The code modules may be stored in any type of non-transitory computer-readable medium or other computer storage device. Some or all of the methods may alternatively be embodied in dedicated computer hardware.
Conditional language such as "capable," "may," or "may" are to be understood in the context of generally indicating that certain embodiments include but other embodiments do not include certain features, elements, or steps unless specifically stated otherwise. Thus, such conditional language is not generally intended to imply that features, elements or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user input or prompting, whether these features, elements or steps are included or are to be performed in any particular embodiment.
Unless specifically stated otherwise, an anti-sense connection language such as the phrase "at least one of X, Y or Z" should be understood in the context of generally meaning that an item, etc. may be X, Y or Z or any combination thereof (e.g., X, Y or Z). Thus, such anti-intent connection language is generally not intended and should not imply that certain embodiments require the presence of at least one of X, at least one of Y, or at least one of Z, respectively.
Articles such as "a" or "an" should generally be construed to include one or more of the described items unless specifically stated otherwise. Thus, phrases such as "a device configured to" a device "are intended to include one or more of the recited devices. Such one or more of the devices may also be collectively configured to perform the recited recitation. For example, a "processor configured to execute statements A, B and C" may include a first processor configured to execute statement A working in conjunction with a second processor configured to execute statements B and C.
The term "or" should generally be understood to be inclusive, rather than exclusive. Thus, a set containing "a, b, or c" should be construed as containing a set comprising a combination of a, b, and c.
Any routine descriptions, elements, or blocks in the flow diagrams described herein and/or depicted in the figures should be understood as potentially representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or elements in the routine. Alternate implementations are included within the scope of the embodiments described herein in which elements or functions may be deleted, or executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those skilled in the art.
It should be emphasized that many variations and modifications may be made to the above-described embodiments, and the elements of these variations and modifications should be understood to be among other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

Claims (20)

Translated fromChinese
1.一种系统,所述系统包括:1. A system, comprising:对象存储服务,所述对象存储服务包括一个或多个计算装置,其中所述对象存储服务被配置为将多个数据对象存储在对象数据存储内;以及an object storage service, the object storage service comprising one or more computing devices, wherein the object storage service is configured to store a plurality of data objects in an object data store; and代码执行服务,所述代码执行服务包括用于在所述对象存储服务的输入/输出(I/O)路径中按需执行函数的一个或多个计算装置;a code execution service comprising one or more computing devices for executing functions on demand in an input/output (I/O) path of the object storage service;其中所述对象存储服务被配置为至少:The object storage service is configured to at least:从客户端接收:(1)作为多个输入数据部分的输入数据,以及(2)对将所述输入数据部分作为数据对象部分存储在所述对象数据存储内的请求;receiving, from a client: (1) input data as a plurality of input data portions, and (2) a request to store the input data portions as data object portions in the object data store;确定将针对每个输入数据部分执行与对存储所述输入数据部分的所述请求相关联的用于为输入数据部分生成校验和值的第一函数;以及determining that a first function associated with the request to store the input data portion for generating a checksum value for the input data portion is to be performed for each input data portion; and将第一调用传输到所述代码执行服务以针对所述输入数据部分中的每一者对所述输入数据部分执行所述第一函数;以及transmitting a first call to the code execution service to execute the first function on the input data portions for each of the input data portions; and其中所述代码执行服务被配置为至少:The code execution service is configured to at least:从所述对象存储服务接收所述第一调用以对所述输入数据部分执行所述第一函数;以及receiving the first call from the object storage service to execute the first function on the input data portion; and执行所述第一函数,其中执行所述第一函数致使所述代码执行服务:Executing the first function, wherein executing the first function causes the code to perform a service:为所述输入数据部分生成单独的校验和值;以及generating a separate checksum value for the input data portion; and将所述单独的校验和值作为第一输出数据返回;并且returning the separate checksum value as first output data; and其中所述对象存储服务还被配置为:The object storage service is further configured as follows:将所述输入数据部分中的每一者的所述第一输出数据作为单独的校验和数据对象进行存储;storing the first output data for each of the input data portions as a separate checksum data object;使用所述输入数据部分中的至少一些来重新汇编所述输入数据;reassembling the input data using at least some of the input data portions;确定将执行用于为所述重新汇编的输入数据生成校验和值的第二函数;以及determining that a second function for generating a checksum value for the reassembled input data is to be executed; and将第二调用传输到所述代码执行服务以针对所述单独的校验和数据对象执行所述第二函数;并且transmitting a second call to the code execution service to execute the second function on the separate checksum data object; and其中,所述代码执行服务被配置为至少:Wherein, the code execution service is configured to at least:从所述对象存储服务接收所述第二调用以执行所述第二函数;以及receiving the second call from the object storage service to execute the second function; and执行所述第二函数,其中执行所述第二函数致使所述代码执行服务:Executing the second function, wherein executing the second function causes the code to perform a service:基于所述单独的校验和数据对象来为所述重新汇编的输入数据生成校验和值;以及generating a checksum value for the reassembled input data based on the separate checksum data object; and将所述重新汇编的输入数据的所述校验和值作为第二输出数据返回;并且returning the checksum value of the reassembled input data as second output data; and其中,所述对象存储服务还被配置为:The object storage service is further configured as follows:将所述第二输出数据作为输入数据校验和数据对象进行存储;storing the second output data as an input data checksum data object;使用所述第二输出数据来执行错误检测以确定是否已经在无错误的情况下接收到所述输入数据;performing error detection using the second output data to determine whether the input data has been received without error;使用所述输入数据部分来重新汇编所述输入数据;以及reassembling the input data using the input data portion; and将所述重新汇编的输入数据作为数据对象存储在所述对象数据存储内。The reassembled input data is stored as a data object in the object data store.2.如权利要求1所述的系统,其中执行所述第一函数致使所述代码执行服务通过使用所述输入数据部分执行循环冗余校验来为所述输入数据部分生成所述单独的校验和值,并且其中执行所述第二函数致使所述代码执行服务通过使用所述单独的校验和数据对象执行循环冗余校验来为所述重新汇编的输入数据生成所述校验和值。2. The system of claim 1 , wherein executing the first function causes the code execution service to generate the separate checksum value for the input data portion by performing a cyclic redundancy check using the input data portion, and wherein executing the second function causes the code execution service to generate the checksum value for the reassembled input data by performing a cyclic redundancy check using the separate checksum data object.3.如权利要求1所述的系统,其中所述对象存储服务还被配置为将所述第二输出数据作为元数据存储到所述数据对象。3. The system of claim 1, wherein the object storage service is further configured to store the second output data as metadata to the data object.4.如权利要求1所述的系统,其中所述对象存储服务还被配置为基于所述单独的校验和数据对象来确定已经在有错误的情况下接收到一个或多个输入数据部分,并且向所述客户端提供关于是否已经在有错误的情况下接收到一个或多个输入数据部分的信息。4. The system of claim 1 , wherein the object storage service is further configured to determine, based on the separate checksum data object, that one or more input data portions have been received with errors, and to provide information to the client about whether the one or more input data portions have been received with errors.5.一种计算机实现的方法,所述计算机实现的方法包括:5. A computer-implemented method, the computer-implemented method comprising:在包括被配置为执行特定指令的一个或多个计算机处理器的计算系统的控制下,Under the control of a computing system including one or more computer processors configured to execute specific instructions,经由单独的上传过程从客户端接收作为多个输入数据部分的输入数据,并且从所述客户端接收对将所述输入数据部分作为数据对象部分存储在数据存储内的请求;receiving input data as a plurality of input data portions from a client via a separate upload process, and receiving a request from the client to store the input data portions as data object portions within a data store;至少部分地基于所述请求来确定在确认将所述输入数据部分作为所述数据对象部分进行存储之前对每个输入数据部分执行第一函数;determining, based at least in part on the request, to perform a first function on each input data portion before confirming storage of the input data portion as the data object portion;使用代码执行系统对所述输入数据部分中的每一者执行所述第一函数,其中所述代码执行系统提供所述数据存储的输入/输出(I/O)路径中指定的函数的按需执行,其中执行所述第一函数包括:Executing the first function on each of the input data portions using a code execution system, wherein the code execution system provides on-demand execution of functions specified in an input/output (I/O) path of the data store, wherein executing the first function comprises:获得所述输入数据部分;Obtaining the input data portion;执行所述第一函数,为所述输入数据部分生成第一函数值;以及executing the first function to generate a first function value for the input data portion; and返回所述第一函数值;Return the first function value;存储所述第一函数值;storing the first function value;接收对将所述输入数据部分中的至少一些汇编成重新汇编的输入数据的请求;receiving a request to assemble at least some of the input data portions into reassembled input data;至少部分地基于所述请求来确定对所述重新汇编的输入数据执行第二函数;determining, based at least in part on the request, to perform a second function on the reassembled input data;使用所述代码执行系统来执行所述第二函数,其中执行所述第二函数包括:Executing the second function using the code execution system, wherein executing the second function comprises:获得所述输入数据部分中的至少一些的单独的第一函数值;obtaining individual first function values for at least some of said input data portions;使用所述单独的第一函数值来执行所述第二函数来为所述重新汇编的输入数据生成第二函数值;以及executing the second function using the separate first function value to generate a second function value for the reassembled input data; and将所述重新汇编的输入数据的所述第二函数值作为第二输出数据返回;以及returning the second function value of the reassembled input data as second output data; and存储所述第二输出数据。The second output data is stored.6.如权利要求5所述的计算机实现的方法,还包括通过使用所述第一函数值来确定在有错误的情况下接收到特定输入数据部分,并且向所述客户端提供关于是否已经在有错误的情况下接收到一个或多个输入数据部分的信息。6. The computer-implemented method of claim 5, further comprising determining, by using the first function value, that particular input data portions were received with errors, and providing information to the client regarding whether one or more input data portions have been received with errors.7.如权利要求5所述的计算机实现的方法,还包括将所述第一函数值提供到所述客户端以使所述客户端能够对所述输入数据部分执行错误检测。7. The computer-implemented method of claim 5, further comprising providing the first function value to the client to enable the client to perform error detection on the input data portion.8.如权利要求5所述的计算机实现的方法,其中执行所述第一函数包括通过使用所述输入数据部分执行循环冗余校验来生成单独的校验和值,并且其中执行所述第二函数包括通过使用所述单独的校验和值中的至少一些执行循环冗余校验来为所述重新汇编的输入数据生成校验和值。8. The computer-implemented method of claim 5 , wherein executing the first function comprises generating separate checksum values by performing a cyclic redundancy check using the input data portions, and wherein executing the second function comprises generating a checksum value for the reassembled input data by performing a cyclic redundancy check using at least some of the separate checksum values.9.如权利要求5所述的计算机实现的方法,还包括使用所述输入数据部分来重新汇编所述输入数据并且将所述重新汇编的输入数据作为数据对象存储在所述数据存储中。9. The computer-implemented method of claim 5, further comprising reassembling the input data using the input data portions and storing the reassembled input data as a data object in the data store.10.如权利要求9所述的计算机实现的方法,其中存储所述第二输出数据包括将所述第二输出数据作为元数据存储到所述数据对象。10. The computer-implemented method of claim 9, wherein storing the second output data comprises storing the second output data as metadata to the data object.11.如权利要求9所述的计算机实现的方法,其中至少部分地基于所述请求来确定为所述重新汇编的输入数据部分执行所述第二函数包括至少部分地基于所述请求来确定在重新汇编所述数据输入之前对所述重新汇编的数据部分执行所述第二函数。11. A computer-implemented method as described in claim 9, wherein determining, based at least in part on the request, to execute the second function for the reassembled input data portion includes determining, based at least in part on the request, to execute the second function on the reassembled data portion before reassembling the data input.12.如权利要求5所述的计算机实现的方法,其中每个输入数据部分包括多个输入数据小部分,每个输入数据小部分具有固定大小,并且其中对所述输入数据部分中的每一者执行所述第一函数包括对每个输入数据部分的每个输入数据小部分执行所述第一函数。12. A computer-implemented method as described in claim 5, wherein each input data portion includes multiple input data sub-portions, each input data sub-portion has a fixed size, and wherein executing the first function on each of the input data portions includes executing the first function on each input data sub-portion of each input data portion.13.一种系统,所述系统包括:13. A system, comprising:数据存储,所述数据存储被配置为存储多个数据对象;以及a data store configured to store a plurality of data objects; and一个或多个计算装置,所述一个或多个计算装置与所述数据存储通信并且被配置为至少:One or more computing devices in communication with the data store and configured to at least:经由单独的上传过程从客户端接收作为多个输入数据部分的输入数据,并且从所述客户端接收对将所述输入数据部分作为数据对象部分存储在数据存储内的请求;receiving input data as a plurality of input data portions from a client via a separate upload process, and receiving a request from the client to store the input data portions as data object portions within a data store;至少部分地基于所述请求来确定在将所述输入数据部分作为所述数据对象部分进行存储之前对每个输入数据部分执行第一函数;determining, based at least in part on the request, to perform a first function on each input data portion prior to storing the input data portion as the data object portion;使用代码执行系统对所述输入数据部分中的每一者执行所述第一函数,其中所述代码执行系统提供所述数据存储的输入/输出(I/O)路径中指定的函数的按需执行,其中执行所述第一函数包括:Executing the first function on each of the input data portions using a code execution system, wherein the code execution system provides on-demand execution of functions specified in an input/output (I/O) path of the data store, wherein executing the first function comprises:获得所述输入数据部分;Obtaining the input data portion;执行所述第一函数,为所述输入数据部分生成第一函数值;以及executing the first function to generate a first function value for the input data portion; and返回所述第一函数值;Return the first function value;存储所述第一函数值;storing the first function value;接收对将所述输入数据部分中的至少一些汇编成重新汇编的输入数据的请求;receiving a request to assemble at least some of the input data portions into reassembled input data;至少部分地基于所述请求来确定为所述重新汇编的输入数据执行第二函数;determining, based at least in part on the request, to perform a second function for the reassembled input data;使用所述代码执行系统来执行所述第二函数,其中执行所述第二函数包括:Executing the second function using the code execution system, wherein executing the second function comprises:获得所述输入数据部分中的至少一些的单独的第一函数值;obtaining individual first function values for at least some of said input data portions;使用所述单独的第一函数值来执行所述第二函数来为所述重新汇编的输入数据生成第二函数值;以及executing the second function using the separate first function value to generate a second function value for the reassembled input data; and将所述重新汇编的输入数据的所述第二函数值作为第二输出数据返回;以及returning the second function value of the reassembled input data as second output data; and存储所述第二输出数据。The second output data is stored.14.如权利要求13所述的系统,其中所述代码执行系统还被配置为通过使用所述第一函数值来确定在有错误的情况下接收到特定输入数据部分,并且向所述客户端提供关于是否已经在有错误的情况下接收到一个或多个输入数据部分的信息。14. The system of claim 13, wherein the code execution system is further configured to determine, by using the first function value, that a particular input data portion has been received with an error, and to provide information to the client about whether one or more input data portions have been received with an error.15.如权利要求13所述的系统,其中所述代码执行服务被配置为将所述第一函数值提供到所述客户端以使所述客户端能够对所述输入数据部分执行错误检测。15. The system of claim 13, wherein the code execution service is configured to provide the first function value to the client to enable the client to perform error detection on the input data portion.16.如权利要求13所述的系统,其中所述代码执行服务还被配置为执行所述第一函数以通过使用所述输入数据部分执行循环冗余校验来生成单独的校验和值,并且执行所述第二函数以通过使用所述单独的校验和值中的至少一些执行循环冗余校验来为所述重新汇编的输入数据生成校验和值。16. The system of claim 13, wherein the code execution service is further configured to execute the first function to generate separate checksum values by performing a cyclic redundancy check using the input data portions, and to execute the second function to generate a checksum value for the reassembled input data by performing a cyclic redundancy check using at least some of the separate checksum values.17.如权利要求13所述的系统,其中所述一个或多个计算装置还被配置为使用所述输入数据部分来重新汇编所述输入数据,并且将所述重新汇编的输入数据作为数据对象存储在所述数据存储中。17. The system of claim 13, wherein the one or more computing devices are further configured to reassemble the input data using the input data portions and store the reassembled input data as a data object in the data store.18.如权利要求17所述的系统,其中所述一个或多个计算装置还被配置为将所述第二输出数据作为元数据存储到所述数据对象。18. The system of claim 17, wherein the one or more computing devices are further configured to store the second output data as metadata to the data object.19.如权利要求17所述的系统,其中所述代码执行服务还被配置为通过至少部分地基于所述请求确定在重新汇编所述数据输入之前对所述重新汇编的数据部分执行所述第二函数来至少部分地基于所述请求确定对所述重新汇编的输入数据部分执行所述第二函数。19. A system as described in claim 17, wherein the code execution service is further configured to determine, at least in part based on the request, to execute the second function on the reassembled input data portion before reassembling the data input by determining, at least in part based on the request, to execute the second function on the reassembled data portion.20.如权利要求13所述的系统,其中每个输入数据部分包括多个输入数据小部分,每个输入数据小部分具有固定大小,并且其中所述代码执行服务被配置为通过对每个输入数据部分的每个输入数据小部分执行所述第一函数来对所述输入数据部分中的每一者执行所述第一函数。20. The system of claim 13, wherein each input data portion comprises a plurality of input data sub-portions, each input data sub-portion having a fixed size, and wherein the code execution service is configured to execute the first function on each of the input data portions by executing the first function on each input data sub-portion of each input data portion.
CN202510446214.XA2019-09-272020-09-23On-demand code obfuscation of data in input path of object storage servicePendingCN120371743A (en)

Applications Claiming Priority (8)

Application NumberPriority DateFiling DateTitle
US16/586,816US11386230B2 (en)2019-09-272019-09-27On-demand code obfuscation of data in input path of object storage service
US16/586,8162019-09-27
US16/586,825US11023311B2 (en)2019-09-272019-09-27On-demand code execution in input path of data uploaded to storage service in multiple data portions
US16/586,8182019-09-27
US16/586,818US10996961B2 (en)2019-09-272019-09-27On-demand indexing of data in input path of object storage service
US16/586,8252019-09-27
PCT/US2020/052280WO2021061820A1 (en)2019-09-272020-09-23On-demand code obfuscation of data in input path of object storage service
CN202080073408.5ACN114586020A (en)2019-09-272020-09-23On-demand code obfuscation of data in an input path of an object storage service

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
CN202080073408.5ADivisionCN114586020A (en)2019-09-272020-09-23On-demand code obfuscation of data in an input path of an object storage service

Publications (1)

Publication NumberPublication Date
CN120371743Atrue CN120371743A (en)2025-07-25

Family

ID=72753024

Family Applications (2)

Application NumberTitlePriority DateFiling Date
CN202510446214.XAPendingCN120371743A (en)2019-09-272020-09-23On-demand code obfuscation of data in input path of object storage service
CN202080073408.5APendingCN114586020A (en)2019-09-272020-09-23On-demand code obfuscation of data in an input path of an object storage service

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
CN202080073408.5APendingCN114586020A (en)2019-09-272020-09-23On-demand code obfuscation of data in an input path of an object storage service

Country Status (3)

CountryLink
EP (1)EP4035047A1 (en)
CN (2)CN120371743A (en)
WO (1)WO2021061820A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11263220B2 (en)2019-09-272022-03-01Amazon Technologies, Inc.On-demand execution of object transformation code in output path of object storage service
US11416628B2 (en)2019-09-272022-08-16Amazon Technologies, Inc.User-specific data manipulation system for object storage service based on user-submitted code
US11360948B2 (en)2019-09-272022-06-14Amazon Technologies, Inc.Inserting owner-specified data processing pipelines into input/output path of object storage service
US11656892B1 (en)2019-09-272023-05-23Amazon Technologies, Inc.Sequential execution of user-submitted code and native functions
US11394761B1 (en)2019-09-272022-07-19Amazon Technologies, Inc.Execution of user-submitted code on a stream of data
US11550944B2 (en)2019-09-272023-01-10Amazon Technologies, Inc.Code execution environment customization system for object storage service
US12299149B2 (en)2021-05-172025-05-13The Toronto-Dominion BankSecure deployment of de-risked confidential data within a distributed computing environment
US11550955B1 (en)*2021-07-202023-01-10Red Hat, Inc.Automatically anonymizing data in a distributed storage system
CN114218597B (en)*2021-12-302023-10-10北京荣达天下信息科技有限公司Method and system suitable for privacy data confidentiality in enterprises

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9647989B2 (en)*2011-04-272017-05-09Symantec CorporationSystem and method of data interception and conversion in a proxy
WO2012145825A1 (en)*2011-04-272012-11-01Perspecsys Inc.System and method for data obfuscation in interception of communication with a cloud
US9323556B2 (en)*2014-09-302016-04-26Amazon Technologies, Inc.Programmatic event detection and message generation for requests to execute program code
US10887291B2 (en)*2016-12-162021-01-05Amazon Technologies, Inc.Secure data distribution of sensitive data across content delivery networks
US20180285591A1 (en)*2017-03-292018-10-04Ca, Inc.Document redaction with data isolation

Also Published As

Publication numberPublication date
EP4035047A1 (en)2022-08-03
WO2021061820A1 (en)2021-04-01
CN114586020A (en)2022-06-03

Similar Documents

PublicationPublication DateTitle
US11386230B2 (en)On-demand code obfuscation of data in input path of object storage service
CN114730269B (en)User-specific data manipulation system for object storage services based on user submission code
CN114586010B (en)On-demand execution of object filtering code in output path of object store service
CN120371743A (en)On-demand code obfuscation of data in input path of object storage service
US10996961B2 (en)On-demand indexing of data in input path of object storage service
US11263220B2 (en)On-demand execution of object transformation code in output path of object storage service
US11023311B2 (en)On-demand code execution in input path of data uploaded to storage service in multiple data portions
US10908927B1 (en)On-demand execution of object filter code in output path of object storage service
US11106477B2 (en)Execution of owner-specified code during input/output path to object storage service
US11023416B2 (en)Data access control system for object storage service based on owner-defined code
CN114586011B (en)Inserting an owner-specified data processing pipeline into an input/output path of an object storage service
US11550944B2 (en)Code execution environment customization system for object storage service
US11250007B1 (en)On-demand execution of object combination code in output path of object storage service
US11416628B2 (en)User-specific data manipulation system for object storage service based on user-submitted code
US11055112B2 (en)Inserting executions of owner-specified code into input/output path of object storage service
US9824233B2 (en)Posixly secure open and access files by inode number
US20180307524A1 (en)Executing code referenced from a microservice registry
US11360948B2 (en)Inserting owner-specified data processing pipelines into input/output path of object storage service
US11966370B1 (en)Pseudo-local multi-service enabled file systems using a locally-addressable secure compute layer
US11394761B1 (en)Execution of user-submitted code on a stream of data
US11656892B1 (en)Sequential execution of user-submitted code and native functions
GB2561862A (en)Computer device and method for handling files
US12197397B1 (en)Offloading of remote service interactions to virtualized service devices
RuanPolicy-based analysis and frameworks for non-consumptive research

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp