Movatterモバイル変換


[0]ホーム

URL:


CN105765575B - Data flow intake and persistence technology - Google Patents

Data flow intake and persistence technology
Download PDF

Info

Publication number
CN105765575B
CN105765575BCN201480061590.7ACN201480061590ACN105765575BCN 105765575 BCN105765575 BCN 105765575BCN 201480061590 ACN201480061590 ACN 201480061590ACN 105765575 BCN105765575 BCN 105765575B
Authority
CN
China
Prior art keywords
data
node
record
subregion
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201480061590.7A
Other languages
Chinese (zh)
Other versions
CN105765575A (en
Inventor
M·M·泰默
G·D·格海尔
J·D·杜纳根
G·伯吉斯
熊颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amazon Technologies Inc
Original Assignee
Amazon Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/077,162external-prioritypatent/US9858322B2/en
Priority claimed from US14/077,171external-prioritypatent/US9720989B2/en
Application filed by Amazon Technologies IncfiledCriticalAmazon Technologies Inc
Publication of CN105765575ApublicationCriticalpatent/CN105765575A/en
Application grantedgrantedCritical
Publication of CN105765575BpublicationCriticalpatent/CN105765575B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

A kind of programming interface is realized, so that the client of flow management service can select the data intake strategy for data flow.Receive the client request that strategy is absorbed in selection at least once.According to the intake strategy at least once, the instruction of data record can be transferred to the server one or manyly by client, until receiving affirmative acknowledgment.Multiple transmission that specific data record is indicated in response to receiving, are sent to the client for corresponding affirmative acknowledgment.In response to a specific transmission in the multiple transmission, it is used for the persistence strategy of the stream based on selection, the copy of the data record is stored at one or more storage locations.

Description

Data flow intake and persistence technology
Background technique
Since the cost of the storage of data for many years has reduced and due to by the various element interconnections of computer infrastructureAbility improved, potentially can acquire and analyze more and more data related with a variety of application programs.For example, movingMobile phone can produce the data of their place of instruction, the application program used by telephone subscriber etc., can acquire and analyzeAt least some of described data that customization discount coupon, advertisement etc. are presented for user.By point of the data of monitor camera acquisitionAnalysis may be useful for preventing and/or solving crime, and be embedded in from aeroengine, automobile or complicated machinery eachThe data of sensor acquisition at kind place can be used for various purposes, and such as preventive maintenance improves efficiency and reduce cost.
The increased use that the volume increase of flow data has been accompanied by commercial hardware (and can pass through quotient in some casesIt is possibly realized with the increased use of hardware).Going out for virtualization technology for commercial hardware has been management for manyThe large-scale calculations resource of the application program of type provides benefit, to allow various computing resources efficiently and safely by moreA client is shared.For example, virtualization technology can be by providing for each user by one of single one physical computer trustshipOr multiple virtual machines and allow the single one physical computer to share between a plurality of users, wherein each such virtual machine fillsWhen the software of Different Logic computing system is simulated, the software is modeled as user and provides with oneself to be that given hardware is calculated and providedThe sole operators in source and the illusion of administrator, while additionally providing the isolation of the application program between various virtual machines and safetyProperty.In addition, some virtualization technologies are capable of providing the virtual resource across two or more physical resources, such as with leapThe single virtual machine of multiple virtual processors of multiple and different physical computing systems.In addition to computing platform, some large organizations are alsoThe various types of storage services established using virtualization technology are provided.By using this storage service, mass data can be depositedStorage has required life level.
Although from various suppliers can relatively low cost obtain virtual computing and/or storage resource, it is largerDynamic fluctuation data flow acquisition, storage and processing management and layout be for various reasons still challenge proposition.When more resources be added to system be arranged for handling biggish data flow when, such as be likely to occur system different piece itBetween workload it is unbalanced.If do not solved, in addition to other resources underuse (and loss therefore) other than,The so this unbalanced severe performance problem that may cause in some Energy Resources Service.Client is also possible to the peace to their flow dataFull property worry, or if this data or result are stored at the uncontrollable facility of client, also to analysis fluxionAccording to result worry.In the case where increasing frequency when increasing in distributed system size, propensity is in the failure of generation(such as internuncial lose once in a while and/or hardware fault) may also must efficiently solve, to prevent the acquisition of fluid stopping data, storageOr the expensive interruption of analysis.
Detailed description of the invention
Fig. 1 is provided to be summarized according to the simplifying for data flow concept of at least some embodiments.
Fig. 2 is provided according at least some embodiments in Workflow Management System (SMS) and adopting including the stream process stageThe general introduction of data flow among each sub-components of the stream processing system (SPS) of collection.
Fig. 3 shows the example of achievable respective sets programming interface at SMS SPS according at least some embodiments.
Fig. 4 shows the exemplary network-based interface according at least some embodiments, and the interface can be realized to makeThe figure in stream process stage can be generated by obtaining SPS client.
Fig. 5, which is shown, submits interface and record inspection according to the achievable programmed recording at SMS of at least some embodimentsThe example of rope interface.
Fig. 6 shows the exemplary elements of the intake subsystem according to the SMS of at least some embodiments.
Fig. 7 shows the exemplary elements of the storage subsystem of the SMS according at least some embodiments.
Fig. 8 shows the exemplary elements and retrieval subsystem of the retrieval subsystem of the SMS according at least some embodimentsWith the example of the interaction of SPS.
The example that Fig. 9 shows the redundant group for establishing the node for SMS or SPS according at least some embodiments.
Figure 10 shows the provider network environment according at least some embodiments, wherein the node of given redundant group can divideCloth is in multiple data centers.
Figure 11 shows multiple placements of the node that can be selected for SMS or SPS according at least some embodimentsDestination.
Figure 12 a and Figure 12 b be shown respectively according at least some embodiments can be by SPS client and SMS clientThe example of the secure option request of submission.
Figure 13 a shows the showing between flow data manufacturer and the intake node of SMS according at least some embodimentsExample sexual intercourse is mutual.
Figure 13 b shows the data record that intake can be generated at SMS according at least some embodimentsThe exemplary elements of sequence number.
Figure 14 shows the orderly reality stored and retrieve of the flow data record at SMS according at least some embodimentsExample.
Figure 15 shows according to the mapping of the flow point area of at least some embodiments and can be directed to what SMS and SPS node be madeThe example of corresponding configuration decisions.
Figure 16 shows the example according to the dynamic stream of at least some embodiments again subregion.
Figure 17 is to may perform to support for data record intake and data note according to showing at least some embodimentsRecord the flow chart of the operating aspect of the respective sets programming interface of retrieval.
Figure 18 a be according at least some embodiments show may perform to configuration the stream process stage operating aspect streamCheng Tu.
Figure 18 b is the client shown in response to the configuration for stream process working node according at least some embodimentsThe flow chart for the operating aspect for holding the component invocation in library executable.
Figure 19 is to may perform to realize for the one or more extensive of stream process according to showing at least some embodimentsThe flow chart of the operating aspect of multiple strategy.
Figure 20 is to may perform to realize a variety of secure options for being used for data flow according to showing at least some embodimentsOperating aspect flow chart.
Figure 21 is to show the behaviour that may perform to realize the partitioning strategies for data flow according at least some embodimentsMake the flow chart of aspect.
Figure 22 is to may perform to realize the behaviour of the dynamic subregion again of data flow according to showing at least some embodimentsMake the flow chart of aspect.
Figure 23 is to may perform to realize for data flow record at least once according to showing at least some embodimentsThe flow chart of the operating aspect of record intake strategy.
Figure 24 is to may perform to realize a variety of persistence plans for being used for data flow according to showing at least some embodimentsThe flow chart of operating aspect slightly.
The example that Figure 25 shows the stream processing system according at least some embodiments, the wherein working node of processing stageCoordinate their workload using database table.
Figure 26 shows being storable in the subregion allocation table coordinated for workload according at least some embodimentsExemplary entries.
Figure 27 is shown can be selected by the execution of the working node in stream process stage at it according at least some embodimentsThe operating aspect of the upper subregion for executing processing operation.
Figure 28 is shown can be based on by the execution of the working node in stream process stage from stream according at least some embodimentsThe operating aspect for the information update subregion allocation table that management service control subsystem obtains.
Figure 29 shows the load balancing that can be executed by the working node in stream process stage according at least some embodimentsThe aspect of operation.
Figure 30 be show can the exemplary computing devices used at least some embodiments block diagram.
Although describing embodiment party herein by the mode for the example for enumerating several embodiments and schematic figuresCase, it will be recognized by one skilled in the art that embodiment is not limited to described embodiment or attached drawing.It should be understood that attachedFigure and detailed description are not intended to for embodiment to be limited to particular forms disclosed, but on the contrary, it is intended toCover all modifications, equivalent and the alternative solution in the spirit and scope for falling into and being defined by the appended claims.ThisAny title used herein is only used for organizational goal, and is not intended the model for limiting description or claimsIt encloses.Such as through used herein, word "available" be with allow meaning (i.e., it is meant that it is possible that) rather than force meaning (that is, meaningTaste must) use.Similarly, word " including (include/including/includes) " mean include but unlimitedIn.
Specific embodiment
It describes and is designed to manipulate hundreds of or even thousands of concurrent data manufacturers and data disappear for managingThe creation of the large-scale data stream of expense person, the method and apparatus for storing, retrieving and processing various embodiments.As used hereinTerm " data flow " refer to and can be generated by one or more data manufacturers and be accessed by one or more data consumersThe sequence of data record, wherein each data record is assumed to the byte of constant sequence.Flow management service (SMS) can provide programmingInterface (such as application programming interface (API), webpage or network address, graphical user interface or command-line tool) enables toCreation, configuration and the deletion flowed, and submission, storage and the retrieval of flow data record in some embodiments.It relates toAnd it (such as stream creation or is deleted or described below various dynamic with the operation of some type of stream of the interaction of SMS control unitState division operation again) can referred to herein as " control plane " operation, and be not usually required to interact with control unit allAs data record is submitted, storage and the operation retrieved can referred to herein as " data plane " operate.The meter of dynamic offer groupCalculation, storage and Internet resources can be used to for example realize the service based on partitioning strategies in some this embodiments, describedPartitioning strategies allow flow management workload to be allocated in many Service Parts in mensurable mode, as follows further detailedCarefully describe.Abbreviation SMS can be used to refer to flow management service herein, and also refer to including the void for realizing flow management serviceThe Workflow Management System of quasi- and/or physical resource acquisition.
In various embodiments, some consumers of SMS can develop the application journey for calling directly SMS programming interfaceSequence.However, in addition to SMS interface, the abstract of greater degree can be provided for consumer or using journey at least some embodimentsThe processing frame of sequence grade, the processing frame, which can simplify, to be not intended to for those using the lower grade directly supported by SMSFlow management function come development and application program client stream process various aspects.This frame can provide its own(such as established on the top of SMS interface) programming interface, so that consumer can grasp with the flow management of lower gradeMake compared to the business logic for focusing more on stream record realization to be used.The frame of higher level can be realized as in some embodiment partyStream process service (SPS) with its own control plane and data plane components in case, the stream process service can providePremium Features, such as being applied to the automation resource of stream process, the automation failure transfer for handling node, constructing at any streamIt manages the ability of work flow diagram table, support temporary current, the dynamic based on workload variation or other trigger conditions subregion etc. again.InIn at least some embodiments, flow management service, stream process service or two kinds of services be may be implemented as in virtualized environmentMulti-tenant management Network Accessible Service.That is, various physical resource (such as computers in such an implementationServer or host, storage device, network equipment etc.) it can at least be shared in the stream of different consumers in some cases,Without make consumer accurately recognize resource be how to share or do not need even to make at all consumer recognize toDetermine resource be shared.The service of the control unit management of the multi-tenant flow management and/or processing of management can dynamically be addedAdd, remove or based on it is various can application strategy reconfigure the node or resource that are used for specific stream, some strategies can be withIt is that client is selectable.In addition, control unit can be also responsible for significantly realizing various types of security protocols (for example, coming trueThe streaming application for protecting a client can not access the data of another client, though at least some hardware or software can by this twoName client is shared), monitoring resource uses for charging, generates and can be used for auditing or the record information debugged etc..From the more of managementFrom the point of view of the customer perspective of tenant's service, it can be eliminated by control/management function that the service is realized and support extensive stream application journeyMany complexity involved in sequence.In some situations, the consumer of this multi-tenant service can be indicated for extremelyThey are not intended to shared resource for few some type of stream relevant operation, in this case, can be extremely for the operation of those typesThe some physical resources of major general be temporarily designated as single tenant (that is, being limited to represent single consumer or client and executeOperation).
Can take in various embodiments realize SMS and/or SPS control plane and data plane operations it is a variety of notSame method.For example, the control server or section of redundant group can be set in some implementations about control plane operationsPoint.Redundant group may include multiple control servers, a server in the multiple control server be designated based on serviceDevice, the primary server is responsible for responding the management request about various streams, and another server can be designated to such asMaster is taken in the case where the trigger condition for generating failure (or the connectivity for losing primary server) at current primary serverServer.In another implementation, it can be used to deposit in one or more tables of the Internet-accessible database service center creationStorage is used for the control plane metadata (such as subregion mapping) of various streams, and various intakes, storage or retrieval node can energyEnough the table is accessed depending on the needs of the subset of metadata needed for obtaining data plane operation.Provided hereinafter about in differenceThe details of the various aspects of SPS and SMS data plane in embodiment and control plane function.It should be noted that realizing flow tubeIn some embodiments for managing service, it may not be necessary to realize and the stream process service of higher level primitive is provided.In other implementationsIn scheme, the high-grade programming interface of stream process service only can be exposed to consumer, and by lower grade usedFlow management interface may be disabled client.
According to some embodiments, Workflow Management System may include multiple subsystems that independently can configure, the subsystem packetIt includes the record intake subsystem for being mainly responsible for acquisition or acquiring data record, be mainly responsible for according to applicable persistence or durabilityStrategy saves the record storage subsystem of data record content and is mainly responsible for response for the note of the read requests of storage recordRecord retrieval subsystem.It can also realize that control subsystem, the control subsystem include one or more in some embodimentsManagement or control unit, one or more of management or control unit are responsible for for example, by being dynamically determined and/or initializing useIn the node of each of intake, storage and retrieval subsystem at selected resource (such as virtually or physically server)Requirement configures its sub-systems.Each of intake, storage, retrieval and control subsystem can be used corresponding multiple hardPart and/or software component realize, the multiple hardware and/or software component can be collectively referenced as subsystem " node " or" server ".Therefore, the various resources of SMS can logically belong to one of four kinds of functional types: absorb, store,Retrieval and control.In some implementations, respective sets control unit can be established for each of other subsystems, such asIndependent intake control subsystem, storage control subsystem and/or retrieval control subsystem can be achieved.This control subsystem can be withIt is each responsible for identifying the resource of other nodes for corresponding to subsystem and/or is responsible for response from client or from other sonsThe management of system is inquired.In some implementations, the node pool for being able to carry out various types of SMS and/or SPS functions can be withIt is configured in advance, and the selected member of those node pools can according to need and be assigned to new stream or new processing rankSection.
Stream partitioning strategies and associated mapping can be achieved, at least some embodiments for example to take the photograph at different groupsTake, store, retrieving and/or control node between distribute data record subset.For example, based on selection for specific data streamPartitioning strategies and based on other factors (such as record uptake ratio and/or recall ratio desired value), control unit can determine mostHow many node (for example, process or thread) just should be established (i.e. at stream creation time) to be used to absorb, store and retrieve, andHow these nodes should be mapped to virtual machine and/or physical machine.Over time, work associated with given streamAmount can increase or decrease, this may cause the subregion again of stream (in addition to other trigger conditions).This subregion again can be related toThe variation of various parameters, such as function for the subregion of determining record, the subregion key used, the sum of subregion, intake node,Memory node or the quantity for retrieving node, or the placement of different physics or the node on virtual resource.In at least some implementationsIn scheme, subregion can use the skill described in further detail below in the case where no interruption data record flowing againArt is dynamically realized.Different partition scheme and subregion trigger criteria in some embodiments can be for example based on client againThe inspiration of the parameter or based on SMS control node of offer and be used for different data flows.In some embodiments, perhaps haveThe quantity and/or frequency of subregion may be for example limited again based on customer priorities, the life expectancy of stream or other factors.
Many different record intake strategies and interface can be realized in different implementation scenarios.For example, in some implementationsIn scheme, client (such as being configured to represent the executable component or module of the programming interface of consumer's calling SMS of SMS)Interface is submitted or with reference to submission interface using online.For submitting online, the interior perhaps ontology of data record is in this implementationIt can be included as submitting the part of request in scheme.In contrast, it is submitted in request in reference, it is possible to provide (such as deposit addressStorage device address, data-base recording address or URL (uniform resource locator)), the interior perhaps ontology of data record can be from instituteState address acquisition.In some implementations, can also or alternatively mixing be supported to submit interface, wherein before data recordN number of byte can be included in line, and remainder bytes (if any) are provided as referring to.It is shorter under this situationRecord (its ontology be less than N byte long) can be completely by submitting request specified, and the part of longer record may have to be from rightAddress is answered to obtain.
It, in some embodiments, can also be real in addition to the different substitutes during intake for designated recorder contentNow related various confirmations or deduplication with intake strategy.For example, for some streaming applications, client be may want to ensure thatEach and all data records are reliably absorbed by SMS.In large-scale distributed flow management environment, data packet may lose,Or various failures may be generated every now and then along the path between data manufacturer and intake node, this may potentially leadCause the loss of data of some submissions.Therefore, in some embodiments, SMS can realize intake strategy at least once, according to describedStrategy record submitter one or many can submit identical record until receiving affirmative acknowledgment from intake subsystem.JustUnder normal operating condition, record can be submitted once, and submitter can after received intake node has obtained and stored recordReceive confirmation.If it is confirmed that losing or delay, or itself lost if record submits request, submitter can it is primary orIdentical data record is repeatedly resubmited, until eventually receiving confirmation.It can be for example based on by mentioning if absorbing nodeFriendship person receives the so described record of confirmation for the expectation that do not resubmit to generate the confirmation for each submission, regardless of describedWhether submit is duplicate.However, intake node can be responsible for identifying identical data record at least some embodimentsIt has been filed on repeatedly, and is responsible for avoiding unnecessarily storing the latest copy of repeated data.In one embodiment, can support to- one version of the strategy of intake at least once (can be described as " absorbing at least once, without repetition ") of few two versions, wherein SMS is negativeDuty deduplication data record (is submitted only in response to one in one group of two or more submission to ensure that data are stored inAt SMS storage subsystem) and a version, wherein the repetition stored by the data record of SMS is allowed (to can be described as " at leastOnce, allow to repeat ").It is described at least once, allow duplicate method to may be useful for streaming application, wherein depositingSeldom or there is no the duplicate negative consequences of data record, and/or the stream application for executing the repeated elimination of themselvesProgram may be useful.It can also support other intake strategies, such as absorb strategy as possible, wherein not needing to mention for allThe confirmation of the data record of friendship.If absorbing strategy as possible to come into force at least some embodiments, low volume data recordLoss be acceptable.It is any that client can select them to be desirable for for various streams in various embodimentsIntake strategy.
About the storage of stream record, many alternative strategies can also be supported at least some embodiments.For example, clientEnd can select persistence strategy, these sides of the policy control record storage from several strategies supported by SMSFace is such as: the quantity of data-oriented transcript to be stored, be ready to use in copy memory technology type (such as volatibility orNon-volatile ram, the storage device based on spinning disk, solid-state device (SSD), network attached storage device etc.) etc..For example,If client is that the storage device based on disk selects n times to replicate persistence strategy, data record submission may not be recognizedTo be that N number of corresponding disk set is completely written safely in N number of copy of record until.Wherein using depositing based on diskIn at least some embodiments of storage device, SMS storage subsystem can be attempted then to remember the data of the input of given subregionRecord write-in disk, such as influenced to avoid the performance of disk tracking.Various technologies as described below can be used to generate sequenceNumber be used for (and being stored with) data record, including for example can based on intake the time come carry out ordered record retrieval based on the timeThe technology of stamp.In at least some embodiments, the data record of given subregion can be stored together, such as on diskContinuously and with the data record of other subregions dividually store.In some implementations, according to retention strategy (by clientEnd selection is selected by SMS) or deduplication time window strategy (indicate submit after any data-oriented record whenPhase, within the period, SMS, which may need to ensure that, is not stored in SMS storage subsystem for the data-oriented transcriptIn system, even if having submitted some copies), at least some data records can be archived to different types of storage device and/or oneIt is deleted after section period from SMS.This removal operation can be described as stream " finishing " herein.In some embodiments, clientStream finishing request, such as notice SMS can be submitted no longer to need specified data to record and therefore from the visitor for submitting finishing requestThe specified data record can be deleted from the perspective of the end of family, or clearly specified data record is deleted in request.CanUnder the situation that can have the multiple client of the data record of consumption given stream, SMS can be responsible for ensuring it is given be recorded in its byIt is not prematurely deleted or is modified before all interested consumer's access.In some implementations, if there is givenN number of data consumer of stream, then SMS may wait for all N number of until having determined that before the given record R for deleting the streamData consumer has read or has processed R.For example, SMS can be requested or be based on data based on the corresponding finishing from consumerConsumer has proceeded to the corresponding instruction of what degree in the stream to determine that R has been read by all consumers.In some realitiesIt applies in scheme, some type of data consumer (such as test related application) is subjected to accessing at least data recordSmall subset before they are deleted.Therefore, at least some embodiments, application program can lead to before retrievalKnow the acceptability that SMS is deleted about data, and SMS can arrange to delete according to notice.In some embodiments, it achievesThe data that strategy can for example be implemented as the type for the storage device that instruction such as flow data record should be copied to retainThe part of strategy and the arrangement strategy for this copy.
In at least some embodiments, can also support multiple programming interface for record retrieval.In an embodimentIn, the method based on iterator can be used, one of programming interface (such as obtaining iterator (getIterator)) is availableCome instantiation at (such as based on sequence number or timestamp) the specified logical offset in the subregion of stream and positioning iterator or refer toNeedle.Different programming interface (such as obtaining next record (getNextRecords)) can be subsequently used to working as from iteratorFront position starts sequentially to read certain amount of data record.The instantiation of iterator can actually allow client flowingAny or random starting position for recording retrieval is specified in subregion.In such an implementation, if client is wishedWith random access mode reads data log, then client may have to repeatedly create new iterator.Based on rotationTurn in the storage system of disk, the tracking of disk needed for frequent random access may influence significantly the I/O response time.CauseThis, at least some embodiments, when the motivation of client be sequentially rather than when randomly reading flow data record, with quiltRandom-read access can be applied to by being applied to different (such as higher) charge rate of sequence read access.Thus, for example, oneIn a little implementations, client can each acquisitions iterator calling be billed X monetary unit, and pass through the next note of acquisitionEach record of record retrieval is billed Y monetary unit, wherein X > Y.When alternative client-side interface is supported for other behaviourWhen making type (such as absorbing), at least some embodiments, charge rate or price for substitute may also be different, exampleIf client may submit reference request more than for submitting request to ask a price online, as client may be for randomIt reads more than reading charge for sequence.In various embodiments, other factors may also influence charging, and such as data are rememberedThe size of record, the distribution write to read request at any time, selected persistence strategy etc..
According to some embodiments, stream process service (SPS) allows client specified appointing including many processing stagesAnticipate complicated processing workflow, wherein the processing of given stage place execution output can be used as zeroth order section or more itsThe input in his stage.In some embodiments, (similar to for for absorbing, storing and the SMS of retrieved data record is retouchedThose of state) partitioning strategies can be used to divide processing workload in multiple working nodes at each stage.It is this at one, it can be achieved that programming SPS interface is so that client can specify the various configurations for any given stage in embodimentSetting, including for example for the stage (for example, one or more streams that data record needs to be retrieved from it are together with for describedThe partitioning strategies of stream) input data source, have and stay in the processing operation executed at the stage and for from the stageOutput or result distribution descriptor or specification (for example, output whether be saved in the form of different stream storage location,It is sent to network endpoint or is fed in other one or more processing stages).In at least some embodiments, specifies and useProcessing operation in the SPS stage can be idempotent: that is, if given processing operation is performed in identical input dataRepeatedly, if then operating result will not be different from operating only being executed once acquired result.If processing operation isIdempotent, then restore to be simplified from failure (such as working node failure at the SPS stage), it is as follows furtherDetailed description.According to some embodiments, the processing operation of non-idempotent is allowed at some or all of SPS stages.
It is at least partially based on configuration information, such as inlet flow partitioning strategies pass through the received place of SPS programming interface with subsequentThe property of operation is managed, SPS control server can determine that how many initial working node has use to be placed in various embodimentsIn each stage of processing workflow.When determining the initial number and placement of working node, it is also contemplated that being ready to use inThe executive capability of the resource of working node (for example, virtual machine currently in use or physical machine).Selected quantity can be instantiatedWorking node (working node can each include executable thread or executable process in some implementations).OftenA working node can be configured, such as with from input resource (such as retrieval node from one or more flow point areas) appropriateData record is obtained, specified processing operation is executed in data record, and processing result is transmitted to specified output purposeGround.In addition, check point scheme may be implemented at least some embodiments, according to the check point scheme, given workNode can be configured to store progress record or indicate the inspection of processed part at that working node of subregionIt tests a little, wherein assuming that partitioned record is sequentially processed.In some implementations, working node can be for example by progress recordPeriodically (for example, every N seconds or every R processed data records) write-in permanent storage device and/or response come fromThe check point of SPS control server is requested.
In some embodiments, progress record can be used for the fast quick-recovery from working node failure.For example, SPS is controlledServer for example can utilize horizontal (such as cpu busy percentage, I/O utilization ratio of device using heartbeat mechanism and/or by monitoring resourceOr network utilization is horizontal) monitor the health status of each working node at any time.In response to being made by SPS control serverParticular job node be in do not need or unhealthy condition (for example, if its be without response or overload) determination, replacementWorking node can be instantiated to take over the responsibility of particular job node.Replacing working node may have access to by replacement working nodeThe nearest progress record of storage, to identify the described group of data record replacing working node and should handling.It is in processing operationIn the embodiment of idempotent, even if some operations are repeated (for example, since nearest progress record is in the reality of replacement working nodeThe some time is written into before exampleization), the total result of the processing will not be influenced by failure and replacement.In some realization sidesIn formula, in addition to storage instruction given stream or subregion by the progress record of its processed subset, working node can also quiltIt configures to store the application state information of accumulation.For example, if stream process workflow is responsible for based on analysis instruction serviceThe flow data of service index records to determine the client charging total value for special services, then working node can scheduled storeFor the charging total value for the accumulation that various clients determine.
In at least some embodiments, SPS control server can also be configured to ring by starting other actionShould be in various other triggerings, such as change workload level or the workload detected are unbalance (for example, if for a subregionUptake ratio be disproportionately higher than those of other subregions uptake ratio), take action all for example each phase requests of described other are defeatedThe dynamic to become a mandarin again subregion, at given stage place change the quantity of working node of the distribution to given subregion, for some stagesIt distributes the working node of higher performance or is transferred to working node with the another of different performance ability from a physical resourceOne physical resource.In some embodiments, for example, being directed to Given Order in response to needing of being made by SPS control serverSection completes the determination of recovery policy (rather than based on recovery policy of check point) as possible, and the progress record of the type described above canNot stored by the working node at least some SPS stages.In this some implementations of recovery policy as possible, work is replacedSimple process can be carried out to them when receiving new data record by making node, without accessing progress record.SomeIn embodiment, if client wishes to realize recovery policy of doing the best at the SPS stage, the stream executed at the stageProcessing operation not necessarily needs to be idempotent.The non-idempotent processing operation executed on stream record at the SPS stage is stayed in havingIn embodiment, the recovery based on check point may not be supported, and the different recovery schemes such as restored as possible can be used.In at least one embodiment, it can be operated at the SPS stage only to allow the stream process of idempotent.
The data record of some streams may include sensitive or confidential information, or the processing operation executed at the SPS stageIt may include the use of proprietary algorithm, if it is problematic that the proprietary algorithm is had found that it is likely that by rival.Client may be becauseThis is concerned about the safety of the various species of flow management and processing operation, especially if using being located at not exclusively by client itselfResource at the provider network data center of control executes the operation.The entity organized by such as company or public sectorIt establishes to provide the addressable one or more network-accessibles of client by internet and/or other networks to distribution groupThe network for servicing (such as various types of databases based on cloud, calculating or storage service) can be referred to as herein to be suppliedQuotient's network.In some implementations, client can from for they data flow a variety of safety-related optionsIn selected.As described above, combined SPS and SMS configuration may include the node for belonging to a variety of different functional types, such asNode, SMS memory node, SMS retrieval node and SPS processing or work are absorbed for the control node of SMS and/or SPS, SMSMake node.In some embodiments, it may include for various types of that makes, which can be used for the selection based on safety of client,The placement of the node of type and the option in place.For example, in one embodiment, client can request be located at clientHold the SPS work section that one or more processing stages for flowing workflow are realized at the computing device on all facilitiesPoint, even if stream record is acquired and/or stored using the resource being located at provider network.In response to this placementRequest, the node of the different functional types for given stream can be in the respective resources with unused security feature or featureIt is instantiated at set.
In different implementation scenarios, the resource collection can be different from each other in various safety-related characteristics,Including such as physical location, physical security agreement currently in use (such as it has the physical access to resource), Network IsolationGrade (such as the network address of resource is to visible level of various entities), multi-tenant are to single tenant etc..In some embodimentsIn, client can establish the virtual network (IVN) of isolation in provider network, wherein given client is endowed pairThe substantive control of network configuration including the various devices in the IVN of that client.Specifically, client can be withCan limit to the network address of each server in the IVN for distributing to them or calculated examples (for example, Internet protocol orIP address) access.In such an implementation, client can request certain subsets in their SMS or SPS node to existIt is instantiated in specified IVN.In the supplier for such as virtualizing example host (it can be commonly configured to multi-tenant host)Internet resources are used in the embodiment of SMS or SPS node of various species, and client can request real on example hostThe some group nodes of exampleization, the example host are realized example (the i.e. some SMS or SPS nodes for only belonging to client by limitingIt can be realized at the example host for being configured to single tenant's host).
In some embodiments, as another safety-related option, client can request the data of specific streamBe recorded in front of transmitting them in network linking and encrypt, for example, at SMS, in intake subsystem and storage subsystemBetween, between storage subsystem and retrieval subsystem, between retrieval subsystem and SPS working node and/or work saveIt is encrypted before intake between point and SPS output destination.In some embodiments, client may specify ready for use addClose algorithm.In one embodiment, the safety net of such as TLS (Transport Layer Security) agreement or SSL (Secure Socket Layer) agreementNetwork agreement can be used for data record transmission and/or be used for transmission SPS processing result.
Data flow concept and general introduction
Fig. 1 is provided to be summarized according to the simplifying for data flow concept of at least some embodiments.As shown, stream 100 canTo include multiple data records (DR), such as DR 110A, DR 110B, DR 110C, DR 110D and DR 110E.Such as dataThe executable write-in of one or more data manufacturers (alternatively referred to as data source) of manufacturer 120A and 120B 151 (such as write behaviourMake) with the content of the data record of generation stream 100.Many different types of data manufacturers can produce in different implementation scenariosRaw data flow, such as mobile phone or tablet computer application program, sensor array, social media platform, record apply journeySequence or system recording-member, different types of monitoring agent etc..One or more data consumer (such as data consumer 130AAnd 130B) content for reading 152 (such as read operations) to access the data record generated by data manufacturer can be performed.SomeIn embodiment, data consumer may include the working node in such as stream process stage.
In at least some embodiments, the given data record being such as stored in SMS may include that data (such as are distinguishedBe DR 110A, DR 110B, DR 110C, DR 110D and DR 110E data portion 101A, 101B, 101C, 101D and101E) and sequence number SN (such as be respectively DR 110A, DR 110B, DR 110C, DR 110D and DR 110E SN 102A,SN 102B, SN 102C, SN 102D and SN 102E).In the embodiment of description, sequence number, which can indicate to receive DR, to flowSequence at management system (or at the specific node of Workflow Management System).In some implementations, data may include notThe unaccounted byte sequence become: that is, 151 (such as write operations) of write-in are once completed, since generated DR is writtenContent can not be changed by SMS, and usually SMS may not know the semantemes of data.In some implementations, given stream100 different data records may include different data volumes, and in other implementations, all data records of given streamThere can be same size.In at least some implementations, the node of SMS (such as intake subsystem node and/or storage subsystemSystem node) it can be responsible for generating SN.As described in further detail below, the sequence number of data record does not need always continuous's.In one implementation, it can provide minmal sequence number as the part of write request, client or data manufacturer and needInstruction for corresponding data record.In some embodiments, data manufacturer can be for example by with providing storage deviceLocation (offset in such as device name and device) can obtain the network address (such as URL) of data portion from it to submitCover the write request of the pointer (or its address) of the data portion of data record.
Flow management service can be responsible for receiving data from data manufacturer, the storage data and make data consumptionPerson can access the data in various embodiments with one or more access modules.In at least some embodiments,Stream 100 can be partitioned or " fragmentation " is with the workload of distribution reception, storage and retrieved data record.In this embodimentIn, subregion or fragment can be selected based on one or more attributes of data record for incoming data record, andWait absorb, store or the specific node of retrieved data record can be identified based on the subregion.In some implementations, dataManufacturer can provide the specific subregion key that can be used as zone attribute with respective write operation, and this key can be reflectedIt is incident upon partition identifier.In other implementations, SMS can be based on such as the IP of the identity of data manufacturer, data manufacturerThis kind of factor of location is based even on the data content of submission to infer partition id.By some realization sides of data flow subregionIn formula, sequence number can be allocated on the basis of by subregion, such as although sequence number can indicate to receive the data of particular zonesThe sequence of record, the sequence number of data record DR1 and DR2 in two different subregions, which may be not necessarily indicative, receives DR1 and DR2Relative ranks.In other implementations, sequence number can be wide in stream rather than be distributed on the basis of by subregion, so that such asThe sequence number SN1 that fruit distributes to data record DR1 is less than the sequence number SN2 for distributing to data record DR2, this will imply that DR1It is received earlier compared with DR2 by SMS, regardless of which subregion DR1 and DR2 belongs to.
By SMS support retrieval and read interface allow data consumer in various embodiments unceasingly and/orWith random-sequential access data record.In one embodiment, the reading application programming based on iterator can be supported to connectMouth (API) group.Data consumer can submit request to obtain the iterator for data flow, wherein the initial bit of the iteratorIt sets by assigned serial number and/or partition identifier instruction.After initiator is instantiated, data consumer can submit request withBy sequential order reads data log since the initial position in the stream or subregion.In such an implementation, such asFruit data consumer is wished with some random sequence reads data logs, then new iterator may have to be for reading every timeIt takes and instantiates.In at least some implementations, it can will be given usually using the sequence write operation for avoiding disk from seekingThe data record of subregion or stream is sequentially written in the storage device based on disk with sequence number.Sequence read operation also can avoid disk and seekThe expense in road.Therefore, in some embodiments, data consumer can be used price incentive to encourage to execute and compare random writeIt takes more sequences to read: being read for example, the random access read operation of such as iterator instantiation can have than sequential accessOperate higher associated charge rate.
Exemplary system environment
Fig. 2 is provided according at least some embodiments in Workflow Management System (SMS) and adopting including the stream process stageThe general introduction of data flow among each sub-components of the stream processing system (SPS) of collection.As shown, Workflow Management System (SMS) 280It may include intake subsystem 204, storage subsystem 206, retrieval subsystem 208 and SMS control subsystem 210.As described below,Each of SMS subsystem may include for example using in provider network (or client all or third party's facility)The one or more nodes or component that the corresponding executable thread or process of various Energy Resources Service's instantiations are realized.Absorb subsystem204 node can be configured based on the partitioning strategies for the stream by (for example, by node of SMS control subsystem 210)To obtain the data record of the specific data stream from data manufacturer (such as 120A, 120B and 120C), and each intakeReceived data record can be transferred to the corresponding node of storage subsystem 206 by node.Storage subsystem node can be according to selectionData record is stored in any different types of storage device by the persistence strategy for the stream.Retrieval subsystem 208Node can respond the read requests from data consumer, such as working node of stream processing system 290.It can be by means ofSPS control subsystem 220 is established processing stage (such as stream process stage), the processing stage 215A of such as stream processing system 290,215B, 215C and 215D.Each processing stage may include being configured in received data record by SPS control subsystem 220Execute one or more working nodes of one group of processing operation.As shown, some processing stages (such as 215A and 215B) canData record directly is obtained from Workflow Management System (SMS) 280, and other processing stages (such as 215C and 215D) can be from otherStage receives their input.In some embodiments, multiple SPS stages can parallel work-flow, such as in processing stage 215AAt processing stage 215B, different processing operations can be performed simultaneously from the data record that identical stream retrieves.It should infuseMeaning, similar to it is shown in Figure 2 for it is specific stream those of corresponding subsystem and processing stage can also for other stream andInstantiation.
In at least some embodiments, at least some nodes of subsystem shown in Figure 2 and processing stage be can be usedProvider network resource is realized.As previously noted, it is established by the entity that such as company or public sector organize logical to provideThe addressable one or more Network Accessible Services of client for crossing internet and/or other networks to distribution group are (such as eachDatabase based on cloud, calculating or the storage service of seed type) network can be referred to as provider network herein.It is someService can be used to construct higher levels of service: for example calculate, store or database service can be used as flow management service orThe building module of stream process service.At least some kernel services of provider network can be in the service unit for being known as " example "It is packaged and is used for client: for example, can be represented " calculated examples " by the virtual machine that virtualization calculates service instantiation, and" storage example " such as can be referred to as by the storage device of the block grade volume of storage device instantiation or data base administration takesBeing engaged in device can referred to as " database instance ".Computing device can be referred to as " example host " herein or more simply claimFor " host ", the computing device can such as realize this unit of the various Network Accessible Services of provider network at whichServer.In some embodiments, subsystem 204, storage subsystem 206, retrieval subsystem 208, SMS control system are absorbedThe node of system 210, processing stage and/or SPS control subsystem 220 may include each calculated examples on multiple example hostsLocate the thread executed or process.Given example host may include several calculated examples, and the meter at particular instance hostThe acquisition for calculating example can be used to realize the node of a variety of different streams for one or more clients.In some implementationsIn scheme, storage example can be used to store the data record of various streams, or the purpose of the result as the stream process stageGround.It is described below with reference to Figure 15 and Figure 16, with the variation of time, control subsystem node may be in response to various triggering itemsPart dynamically modifies the group of other subsystems, such as by adding or removing node, concept transfer to processing example or calculatingThe mapping of example or example host or again subregion given stream continue to receive simultaneously, store and process data record.
In the content that provider network resource be used to flow the embodiment of relevant operation, term " client " is when being used asIt can refer to when the source or destination of given communication by entity (such as tissue, the group with multiple users or single user) instituteThere are, manage or be dispatched to any one of computing device, process, hardware module or software module of the entity, the realityBody be able to access that and using provider network at least one Network Accessible Service.A kind of client of service can be made with itselfIt is realized with the resource of another kind service, such as flow data consumer (client of flow management device) may include calculated examples(resource that service provides is calculated by virtualization).
Given provider network may include that (it can spread different ground for many data centers of the various resource pools of trustshipRegion is managed to distribute), such as physics and/or virtual computer service device, each storage with one or more storage devicesThe foundation structure and service that the acquisition of server, the network equipment etc. needs to realize, configuration and distribution are provided by supplier.ThisIn embodiment, many different hardware and/or software component can be used to realize each in the service jointly, described hardSome in part and/or software component can be instantiated or hold at different data centers or in different geographic areasRow.Client can be with the resource and service interaction at provider network, and the resource and service are outside provider networkPositioned at client is all or the device and/or the dress in provider network of the place of client-side management or data centerIt sets.It should be noted that although provider network, which is used as, can wherein realize that one kind of many flow managements as described herein and processing technique is shownExample property content, those technologies also apply to the other kinds of distributed system other than provider network, such as usingTo the large-scale distributed environment operated by individual enterprise's entity for the application program of its own.
Programming interface embodiment
As indicated above, at least some embodiments, SPS can construct more Gao Shui using SMS programming interfaceFlat function, the function can be used more easily by SPS client to realize for the various application programs based on streamRequired business logic.When in view of the difference between SPS function and SMS function, it may be useful for analogizing.SPS function can be withGenerally be compared in the programming language structure of higher levels of language (such as C++), and SMS function can generally withAssembly language directive is compared, and programming language structure is converted to assembly language directive by compiler.Perhaps it is possible that directlySame operation is realized using assembly language directive, but may be usually easier in the programming of higher level language for manyThe consumer or user of type.Similarly, perhaps there is a possibility that realize various application programs with the primitive provided by SMS, butIt is that this may be more easily accomplished by using SPS feature.(the processing of the idempotent executed in such as data record of SPS processing operationOperation) it can be realized on the data content of stream record, and SMS operation is performed and records itself with acquisition, storage and retrieval, andThe content of the record is not considered usually.Fig. 3 shows achievable corresponding at the SMS SPS according at least some embodimentsThe example of group programming interface.By example, many different application programming interfaces (API) are indicated for SMS and SPS.The API shown is not intended to be the full list that API those of is supported in any given implementation, and the API shownIn some may not be supported in given implementation.
As indicated by arrow 350, SPS client 375 can call SPS programming interface 305 to configure processing stage.RespectivelyThe SPS programming interface 305 of seed type can be realized in different implementation scenarios.For example, the creation stream process stage(createStreamProcessingStage) API can enable a client to request the new place for specifying inlet flowThe configuration in reason stage so that each of the working node in the stage be configured to carry out it is specified in interface callingOne group of idempotent operation, and result is distributed to the destination by output distribution descriptor or strategy instruction.In creation stream processIn some versions of stage API or its equivalent, client can also request creation inlet flow, and in other versions, inputStream may have to create before generating processing stage.Recovery policy can be specified for working node, to for example refer toShow whether the recovery technology based on check point has to be used or whether recovery technology is preferred as possible.In some embodimentsIn, initial work node (initializeWorkerNode) API can be supported, to request working node at specified phasesClearly instantiate.In the embodiment for realizing the recovery based on check point, it can support to save check point(saveCheckpoint) API, to allow client request to generate progress record by working node.
Various types of SPS outgoing management API, such as setting output distribution can be supported in different implementation scenarios(setOutputDistribution) API, client export distribution API by the setting and can indicate using in specified phasesLocate the result of the processing operation executed and the particular zones strategy of newly created stream will be used for and one or more to be createdStream.Some processing stages can be mainly configured to subregion again, such as be mapped data record based on record attribute collection A1A kind of sectoring function PF1 to N1 subregion can be used for inlet flow S1, and processing stage can be used to realize different subregionsThose identical data records are mapped to N2 points with (using different property set A2 or identical property set A1) by function PF2Area.The some SPS API for such as linking stage (linkStage) can be used to the arbitrary graphic that configuration includes multiple stages(such as directed acyclic graph).In some embodiments, the connection of third party or increase income stream process frame or service can be supportedDevice.In a this embodiment, it can be used for that (such as the processing operation by executing at the stage is suitable the SPS stageWhen the result of formatting) prepare data record, for the consumption by existing third party or open source system.In the embodiment party of descriptionIn case, the API of such as creation third party's connector (createThirdPartyConnector) can be used for establishing this connectionDevice, and the result in SPS stage can be executed by one or more connector modules to the format compatible with third party systemAppropriate conversion, one or more of connector modules are instantiated as the result that creation third party's connector calls.
SPS can call SMS programming interface 307 to execute at least some of its function, as indicated by arrow 352.In the embodiment of description, SMS programming interface 307 may include such as creation stream (createStream) and deletion stream(deleteStream) (be respectively created and delete stream) and obtain stream information (getStreamInfo) (with obtain for flowingMetadata, be such as responsible for the network address of the various types node of given subregion).Placing record (putRecord) interface canFor data record is written, and obtain iterator (getIterator) and obtain next record (getNextRecord) interface canIt is respectively used to non-sequential and sequence reading.In some embodiments, subregion stream interface can be used for requesting specified stream againDynamic subregion again.Wish that the client 370 done so can call directly SMS programming interface 307, as indicated by arrow 354's.As previously indicated, various other SMS and/or SPS API can also be realized in other embodiments, and in some realitiesApply can not be realized in scheme it is some in the API listed in Fig. 3.
In various embodiments, the programming interface other than API can also with or instead be implemented to SPS orSMS.This interface may include graphical user interface, webpage or website, command line interface etc..In some cases, it is based on networkInterface or GUI can be used API as building module, such as network-based interaction can be at the control unit of SMS or SPSGenerate the calling of one or more API.Fig. 4 shows the exemplary network-based interface according at least some embodiments, instituteStating interface may be implemented such that SPS client can generate the figure in stream process stage.As shown, interface includes having messageWebpage 400, EFR STK area 404 and the graphic designs area 403 in area 402.
User can be provided the normal instruction of the building about the stream process figure in message area 402, and can useIn the study more link about stream concept and primitive.Many graphic icons can be provided as at stream in EFR STK area 404Manage the part of graphical tool collection.It is held for example, client can be allowed to instruction as the input or output of each SPS processing stageAfterflow 451, temporary current 452 or the connector 453 to third party (such as third party's processing environment).About network-based interfaceIt is implemented to its SPS/SMS, its data record can be defined as and be stored in persistent storage by persistently flowing 451Stream, the persistent storage such as disk, non-volatile ram or SSD, and temporary current 452 can be defined as itData record does not need to be stored in a kind of stream at persistent storage.Temporary current can be for example from the output in SPS stageIt generates, the output is expected the different SPS stages by recovery policy as possible to be achieved as input consumption.
Two kinds of processing stage is supported in exemplary SPS figure building webpage 400: using based on the extensive of check pointThe processing stage 455 of multiple strategy is (for example, each working node saves progress record every now and then, and in the event of particular job nodeIn the case where barrier, which data record replacement node starts to process with reference to the progress record of the node of failure with determination);AndUsing the processing stage 457 of recovery policy as possible, wherein using restoring as possible (for example, replacement working node is remembered without reference to progressRecord, but only start to handle it when receiving new data record).Execution at each stage is stayed in about havingThe details of processing operation can be entered and clicking in the corresponding icon in graphic designs area 403, such as by message area 402Instruction indicated by.In addition to for flowing, the icon of connector and processing stage, EFR STK area 404 further includes at third partyReason system 459 (such as third party or external stream processing system), and the storage device section that can be realized at provider networkPoint 461 (such as one or more nodes), the resource of the provider network is used for the processing stage.
Under the Exemplary contexts being shown in FIG. 4, client has constructed figure 405, and the figure 405 is in graphic designs areaIt include three processing stages 412,415 and 416 in 403.It is configured to make using the processing stage 412 of the recovery based on check pointUse lasting stream 411 as input.The output of processing at stage 412 or result are sent to two destinations: being in formation stagesThe form of the different lasting streams 413 of 415 input;And in the form of the temporary current 414 of the input of formation stages 416.RankSection 415 and 416 all uses recovery policy of doing the best for their working node.The output in stage 415 is by the form of temporary current 418It is sent to storage service node 419.The output of processing stage 416 is sent to third method, process system by connector 417459." saving figure " button 420 can be used to for example save indicating for processing stage figure with any appropriate format, describedFormat such as JSON (JavaScript object labelling method), XML (extensible markup language) or YAML.In various embodimentsIn, arbitrarily complicated processing workflow can be used similar to those tools shown in Fig. 4 and construct.It is created using this toolThe workflow built then can be activated, and this activation can cause the calling of SMS API, such as to obtain for handling rankThe data record of section (stage 412 of such as Fig. 4), obtains iterator interface and/or the next record interface of acquisition can be in lasting streamIt is called on 411.
Fig. 5, which is shown, submits interface and record inspection according to the achievable programmed recording at SMS of at least some embodimentsThe example of rope interface.In the embodiment of description, the data record of all DR 110K as shown and DR 110Q can be by eachThe programming intake interface 510 of seed type submits to SMS.In some embodiments, DR may include the element of four seed types: fail to be sold at auctionKnow symbol, such as stream ID " S1 " 501A or stream ID " S2 " 501B;The data of record or the instruction of ontology;Optional client supplyThe subregion key 504A or subregion key 504B of optional client supply;And optional sequence preference indicator is (such as optionalThe sequence preference indicator 506B of the sequence preference indicator 506A of client supply and optional client supply).SomeIn data record, data itself can provide (such as online data 502 of DR 110K) online, and can for other data recordsPointer or data address 503 are provided, to indicate that network-accessible place (or does not need the local dress of network transmission for SMSSet the address at place).In some embodiments, given stream can support online data record to submit and with reference to (based on address) numberIt is submitted according to record.In other embodiments, given stream may need data manufacturer to supply all online data or instituteSome reference datas.In some implementations, data record submission may include the partition identifier for being ready to use in the record.
In the embodiment of description, incoming data record can be directed to based on partitioning strategies corresponding intake and/Or memory node.Similarly, record retrieval is also possible to based on subregion, such as one or more retrieval nodes can be designatedFor the read requests in response to the record for given subregion.For some streams, data manufacturer, which may need to provide, to be hadThe specific subregion key of respective data record write request.For other streams, SMS can be distributed according to partition schemeData record, the partition scheme depends on metadata or the attribute other than the subregion key clearly supplied, for example, with submissionThe related identification information of data manufacturer can be used as subregion key, or the portion of the IP address of the data manufacturer submitted can be usedDivide or all, or the part for the data submitted can be used.In some implementations, for example, hash function can be answeredWith obtaining a certain size integer value, such as 128 integers to subregion key.The size (such as from 0 to 2^128-1) is justThe full scope of integer can be divided into N number of continuous subinterval, wherein each subinterval represents corresponding subregion.Therefore,In such instances, determine or for be applied to data record any given subregion key will be hashed into corresponding 128 it is wholeNumber, and the continuous subinterval of 128 integers of the integer subordinate can indicate the section of the data record subordinate.AboutPartitioning strategies and their the other details used are provided hereinafter with reference to Figure 15.
It is responsible for intake or receives the data record of particular zones, stores the data record and in response to for the spyThe group node for determining the read requests of subregion is collectively referenced as the ISR for the subregion being used in Fig. 5 (intake, storage and retrieval) sectionPoint.Label Sj-Pk is used to refer to k-th of subregion of stream Si.In the shown embodiment, intake/storage/retrieval (ISR)Node 520A is configured for the record of intake, storage and retrieval subregion S1-P1, and ISR node 520B is established for subregion S1-The record of P2, ISR node 520C are established for the record of subregion S1-P3, and ISR node 520K is established for subregion S2-P1Record, and ISR node 520L is established for the record of subregion S2-P2.In some embodiments, absorb subsystem,The given node of storage subsystem or retrieval subsystem can be configured to handle more than one subregion (or more than one flowBe more than a subregion) data record.In some embodiments, the record of the single subregion of given stream can be by more than oneA node intake, storage or retrieval.Specify the quantity of the intake node for giving subregion Sj-Pk can be at least some situationsThe lower quantity for being different from specifying the intake node for different subregions Sj-Pl, and can also be different from specifying for Sj-Pk'sThe quantity of memory node and/or the quantity for specifying the retrieval node for Sj-Pk.In some embodiments, about intake and/Or retrieval, SMS control node can realize API (such as acquisition stream information), to allow client to determine which which node is responsible forSubregion.The mapping configured between data record and subregion and between subregion and ISR node (or control node) can be at any timeAnd modify, it is as follows in about dynamically again described in the discussion of subregion.
In some embodiments, several different programming Retrieval Interfaces 580 can realize for from given area search orReading flow data record.As shown in Figure 5, some non-sequential Retrieval Interfaces 581 can be realized for non-sequential access, such as be obtainedIterator interface (instantiate iterator or reading pointer at the data record with assigned serial number or thereafter) or obtain(getRecord) interface must be recorded (read the data record with assigned serial number).Other ordered retrieval interfaces 582 can be realIt is current in ordered retrieval, such as obtain next record interface (its be request from the current location of iterator according to increase sequence numberSequence read the interface of N number of record).In the storage system based on spinning disk, as previously mentioned, sequential I/O is being permittedIt is much effective than random I/O in more situations, because the quantity that the magnetic disk head needed on averagely each I/O is found is for sequenceIt can be usually more much lower than random I/O for I/O.In many embodiments, the data record for giving subregion can be suitable with sequence numberSequence write-in, and therefore the sequence read requests based on sequence number sequence (such as using obtaining next record interface or similar connectMouthful) much effective than random read requests.In at least some embodiments, therefore, different charge rates can be setFor sequence to non-sequential Retrieval Interface, such as more expenses may be charged for for non-sequential reading client.
Absorb subsystem
Fig. 6 shows the exemplary elements of the intake subsystem 204 according to the SMS of at least some embodiments.In descriptionIn embodiment, intake operation is logically divided into front-end functionality and back-end function, and wherein front-end functionality is related to raw with dataThe interaction of business men (such as 120A, 120B or 120C), and back-end function is related to the interaction with SMS storage subsystem.Before thisThe separation of end/rear end can have the advantages that several, such as strengthen the safety of storage subsystem and avoid having to dataThe details of manufacturer's offer partitioning strategies.SMS client library 602 can provide for the installation in various data manufacturers, andAnd data manufacturer can call the programming interface for including in SMS client library 602 to absorb to submit data.For example, at oneIn embodiment, data manufacturer may include the example at the hundreds and thousands of a physics and/or virtual server of provider networkThe record or monitor agent of change.This agency can acquire various log informations and/or index at their corresponding servers,And the message of acquisition or index are periodically submitted to the front end instantiated by one or more intake control nodes 660 of SMS604 endpoint of load divider.In some embodiments, one or more virtual ip address (VIP) can establish divides for loadingFlow data can be submitted to load divider by orchestration, data manufacturer.In one implementation, Circling DNS (domain name system)Technology can be used for VIP, and to select certain loads distributor from the load divider of several comparable configurations, data need by dataManufacturer is sent to the load divider.
In the embodiment of description, received data record can be guided to several front end nodes (such as 606A, 606BAny one of or 606C).In at least some embodiments, front end load distributor 604 may not known about for dataThe partitioning strategies 650 of record, and front end node can be therefore by using round robin load-balancing (or some other general loadsEqualization algorithm) rather than load balancing based on subregion and be selected for data-oriented record.Front end node, which can be appreciated that, to be used forThe partitioning strategies 650 of various streams, and can with intake control node 660 interact with obtain specified backend nodes (such as 608A,608B or 608C) identity, the backend nodes are configured for the data record of given subregion.Therefore, in the implementation of descriptionIn scheme, data record is transmitted to by the respective partition that front end load distributor 604 can be each based on data record institute subordinateMultiple backend nodes.As previously noted, the subregion of data record institute subordinate can be determined based on any combination of various factors,The identity of subregion key, such as data manufacturer that the factor is such as supplied by data manufacturer or address it is one or more itsThe content of his attribute or data.
Backend nodes can respectively receive the data record for being subordinated to one or more subregions of one or more streams, and by instituteState one or more nodes that data record is transmitted to storage subsystem.In some embodiments, backend nodes can be claimedFor " placing (PUT) server ", wherein data are serviced API by HTTP (hypertext transfer protocol) " placement " network and are submitted.Given backend nodes can determine storage subsystem node collection, and data record needs by looking into the intake submission of control node 660Asking and being transmitted to the storage subsystem node collection (is the node collection by separating in the control function for being wherein used for different sub-systemsIn the embodiment of processing, this transfers that corresponding inquiry can be submitted to the control node of storage subsystem).
In at least some embodiments, it can support many different intake ACK strategies 652, such as absorb at least onceStrategy absorbs strategy as possible.In strategy at least once, data manufacturer may need to the data record of each submissionPositive assurance, and repeatably submit identical data record (if not receiving the confirmation submitted for the first time) until finalReceive confirmation.As possible absorb strategy in, at least some data records of submission may not be needed positive assurance (althoughIntake subsystem may still provide confirmation once in a while, or may be in response to the clear request to confirmation from data manufacturer).In some embodiments that wherein intake subsystem 204 needs to provide confirmation for data manufacturer, before generating confirmation, bearThe backend nodes of duty data-oriented record may wait for the requirement until having successfully created data record at storage subsystemCopy (for example, according to establish be used for the stream persistence strategy).In various embodiments, sequence number can be by intakeSystem, which generates, is used for received each data record, such as indicates that the record is remembered relative to other of identical subregion or streamThe sequence of intake is recorded, and this sequence number can be used as confirmation or the part as confirmation returns to data manufacturer.About sequenceThe other details of row number is provided hereinafter with reference to Figure 13 a and Figure 13 b.In some implementations, confirmation and/or sequence numberData manufacturer can be transmitted back to by front end node.In at least one implementation, strategy can be in intake subsystem at least onceUnite itself front end node and backend nodes between realize, such as given front end node can be for backend nodes appropriate repeatedlyData record is submitted, until backend nodes provide confirmation.
Intake control node 660 can be responsible in addition to other function: instantiation front end node and backend nodes, monitoring nodeHealth and workload are horizontal, coordinate fault as needed shifts, provide to the inquiry which node to be responsible for given subregion aboutResponse or the response that policy-related (noun) is inquired, the relevant configuration behaviour of intake for the dynamic from stream again subregionMake.In some embodiments, specify the quantity itself of the intake control node of the given collection for one or more stream can be withTime and change, such as one or more main control node can be responsible for as needed to reconfigure control node pond.WhereinRedundant group be established for intake front end node or backend nodes some embodiments in, it is as follows in about Fig. 9 and Figure 10It being described in further detail, intake control node 660 can be responsible for tracking which node is primitive and which is non-primitive,The trigger condition of failure transfer is directed to and when needing failure to shift for selecting alternative for detecting.It should be noted that oneMultilayer intake subsystem architecture shown in Fig. 6 can not be realized in a little embodiments, such as can only be matched under some situationsSet single group intake node.
Storage subsystem
Fig. 7 shows the exemplary elements of the storage subsystem of the SMS according at least some embodiments.As shown, taking the photographTaking node 608 (such as in wherein front-end and back-end intake responsibility is that rear end in the embodiment handled by different group nodes is taken the photographTake node) data record of one or more subregions of stream can be transmitted to the corresponding storage section for being configured to those subregionsPoint.For example, the DR 110A of subregion S1-P1 is sent to memory node 702A, the DR110B of subregion S2-P3 is sent to storageNode 702B and 702C, the DR 110C of subregion S3-P7 are sent to memory node 702D, and the DR 110D of subregion S4-P5It is initially sent to memory node 702E.Storage control node 780 can be responsible for: implement to be applied to the not data record of cocurrent flowPersistence strategy 750 configures and reconfigures as needed memory node, monitoring memory node state, the transfer of management failure, ringsIt should be inquired in storage configuration or storage strategy is inquired and various other management roles in the embodiment of description.
In different implementation scenarios, persistence strategy 750 can be different from different ways each other.For example, usingTo stream Sj persistence strategy P1 can in the following areas in be different from being applied to the tactful P2:(a of stream Sk) it is each of to be storedThe quantity of the copy of data record;(b) copy needs to be stored type (such as the copy of storage device thereon or systemWhether wait store volatile memory, non-volatile cache, the storage device based on spinning disk, solid state drive(SSD), various types of memory has, in all kinds of RAID (redundant array of inexpensive disk), if wait store data base management systemIn, in node of storage service realized by provider network etc.);(c) geographical distribution of the copy is not (such as byWith copy is placed in data center, whether flow data is recoverable for extensive failure or certain type of disaster);(d)Write-in confirmation agreement (for example, if need to be stored N number of copy, then should will confirm that be supplied to intake node before must succeedHow many copies in N number of copy are written);And/or (e) in the case where the multiple copies for thering is data to be stored to record,Whether the copy should in parallel or sequentially be created.In some cases for the multiple copies for thering is data to be stored to recordUnder, such as in the case where DR 110D, data record can be transmitted to another memory node and (such as deposited by given memory nodeThe DR 110D for being used to further replicate is sent to memory node 702F by storage node 702E, and memory node 702F by its afterIt is continuous to be sent to memory node 702G).In other for using more copy persistence strategies, such as need to be stored two for itIn the case where the DR 110B of copy in a memory, intake node can concurrently start multiple copies.In at least some implementationsIn scheme, the persistence strategy of the selection of client can not be assigned with the type for being ready to use in the storage location of flow data record;On the contrary, SMS can select memory technology and/or the place of appropriate type, the standard such as cost, property based on various standardsCan, to the degree of approach, the durability demand of data source etc..In one embodiment, client or SMS can determine using forThe different subregions of constant current or different memory technologies or storage location type for not cocurrent flow.
In the example being shown in FIG. 7, the persistence strategy for being applied to stream S1 (or the subregion S1-P1 at least flowing S1) isThe single strategy of copy in memory, and be that flow S2 application is the two parallel strategies of copy in memory.Therefore, DRDR copy 704A in the memory of 110A is created at memory node 702A, and corresponding in two memories of DR 110BDR copy 705A and 705B concurrently created at memory node 702B and 702C.For flowing the DR 110C of S3, creation is singleDR copy 706A on disk.For flowing S4, can application order strategy of three copies on disk, and therefore storingDR copy 707A, 707B and 707C on corresponding disk are sequentially created at node 702E, 702F and 702G.In different realitiesIt applies in scheme, the persistence strategy of various other types can be applied to data flow.The node of retrieval subsystem is in response to variousThe retrieval API of type can obtain data record from memory node appropriate by the calling of data consumer.
Retrieval subsystem and processing stage
Fig. 8 shows the exemplary elements and retrieval subsystem of the retrieval subsystem of the SMS according at least some embodimentsWith the example of the interaction of SPS.As shown, retrieval subsystem 208 may include multiple retrieval nodes, such as retrieval node 802A,802B and 802C, and the set of retrieval control node 880.Each of retrieval node can be configured to response from eachThe flow data retrieval request of other data consumers of kind, the working node of SPS such as described below.In different embodiment partyIn case, a variety of programming Retrieval Interfaces can be realized by retrieval node, all non-sequential as previously described and sequence Retrieval Interfaces.InIn some embodiments, the network service API that such as HTTP obtains (GET) request can be used for data record retrieval, and retrieveTherefore node can be referred to as obtains server.In the embodiment of description, given retrieval node can be controlled for example by retrievalNode 880 configures, one or more to obtain from the storage subsystem node (such as memory node 702A and 702B) suitably organizedThe data record in flow point area.
In the embodiment of description, retrieval node can be interacted with one or more memory nodes, and be additionally in response to fromOne or more received retrieval requests of SPS working node.For example, the data record (such as DR 110K) of subregion S4-P5 and pointThe data record (such as DR 110L) of area S5-P8 is read by retrieval node 802A from memory node 702A, and is mentioned respectivelySupply working node 840A and 840K.The data record (such as DR 110M) of subregion S6-P7 by by retrieval node 802B from storageNode 702A is read, and is supplied to working node 840K.The data record DR 110N of subregion S4-P7 is by by retrieval node 802CIt is read from memory node 702B, and is supplied to working node 840B, and be also provided to other data consumers (for example, directlyThe data consumer for calling SMS retrieval API rather than being interacted by SPS with SMS).
In at least some embodiments, retrieving in node some or all of can be achieved corresponding cache (such asIt retrieves at the DR cache 804A at node 802A, the DR cache 804B at retrieval node 802B and retrieval node 802CDR cache 804C), wherein the data record of each subregion anticipates that the retrieval request in future can temporarily retain.RetrievalControl node 880 can be responsible for realizing many retrieval/cache policies 882, including for example (such as high speed is slow for cache policiesDeposit should be configured it is much for give subregion, data record should be cached how long), memory node selection strategy (exampleWhich particular memory node such as should be contacted at first in the situation of data record for storing multiple copies to obtain to fixed numberAccording to record) etc..In addition, retrieval control node can be responsible for: instantiating and monitoring retrieval node, response are born about which retrieval nodeIt blames the inquiry of which subregion, start or in response to division operation again etc..
In the example shown, stream processing system 290 includes two processing stages: 215A and 215B.SPS control node885 can be responsible for instantiating working node 840A, 840B, 840K at each processing stage, such as handle the record of subregion S4-P5Working node 840A, handle subregion S4-P7 record working node 840B and handle subregion S5-P8 and S6-P7 recordWorking node 840K.SPS control node 885 can realize programming interface (those interfaces shown in such as Fig. 3 and Fig. 4), withEnable SPS client design treatment workflow.Various check point strategies 850 can realize for different processing stage orWorkflow, to indicate when or whether working node needs to be stored progress record, the progress record indicates the workMake node and reaches what degree, the type for having the storage device for being ready to use in progress record etc. in handling their corresponding subregions.Failure transfer/recovery policy 852 can indicate to will lead to using different nodes the trigger condition or threshold value of replacing working node,And restore whether to there is recovery to be used or based on check point whether to have to be ready to use in given processing stage as possible.At leastIn some embodiments, SPS control node 885 can be interacted with various types of SMS control nodes, such as be needed with identification from itObtain the retrieval node of the data record of given stream, the new temporary current that foundation may need particular procedure workflow orLasting stream etc..In at least one embodiment, client can be interacted with SPS control node to instantiate stream, such as some visitorsFamily end may want to the SPS interface for only calling higher level rather than utilize SMS control interface.Although should be noted that Fig. 6, Fig. 7 andSeparated control node collection is shown for SMS intake, storage and retrieval subsystem in Fig. 8, and for the SPS stage, at leastThe control node given in some embodiments can be used for several subsystems and/or SPS.
Node redundancy group
In at least some embodiments, the redundant group of node can be configured for one or more subsystems of SMS.That is, instead of for example configuring a retrieval node for two or more can be established for flow point area Sj-Pk retrieved data recordMultiple nodes are used for this retrieval, and one of node is in time authorized " main " or positive role in set point, andOther node or multiple nodes are designated as " non-principal " node.Current main node can be responsible for responsive operation request,Such as from client or from the received request of the node of other subsystems.A non-staple node or multiple nodes can be keptStop, until for example due to failure, to main node internuncial loss or other trigger conditions and trigger failure turnIt moves, the non-primary node selected at that time can be taken over the responsibility of previous main node by control node notice.It is shifted in failurePeriod, therefore dominant role can be recalled from current incumbent main node, and current non-primary node is awarded.OneIn a little embodiments, timing (such as may not be needed explicitly to notify) really, non-principal section are shifted when making to break downPoint can be taken over as main node itself.In various embodiments, the redundant group of this node can be established at SMSIntake, storage, retrieval and/or control function, and at least some embodiments, it can also be taken at SPS similarMethod is used for working node.It in some embodiments, include at least one main node and at least one for give functionThis group of a non-primary node can be referred to as " redundant group " or " copy group ".It should be noted that the redundant group of memory node can be onlyIt on the spot realizes the quantity of the physical copy of the data record of storage, such as has the quantity of data to be stored transcript can be by holdingLong property strategy determines, and the quantity for being configured for the memory node of corresponding subregion can be determined based on redundancy group policy.
The example that Fig. 9 shows the redundant group for establishing the node for SMS or SPS according at least some embodiments.In the embodiment of description, for given stream subregion Sj-Pk, corresponding intake node redundancy group (RG) 905, storage section are establishedPoint RG 915, retrieval node R G 925 and control node RG 935.Control node RG is realized in the shown embodiment935, although can be realized in some embodiments for absorbing control node, storage control node or point for retrieving control nodeThe RG opened.Each RG includes that main node (such as main intake node 910A, main memory node 920A, mainly retrieves node930A and main control node 940A) and at least one non-primary node (such as non-principal intake node 910B, non-principal depositStore up node 920B, non-principal retrieval node 930B and non-principal retrieval node 940B).Turned according to corresponding intake node failureMove strategy 912, memory node failover policy 922, retrieval node failure transition strategy 932 and the transfer of control node failureStrategy 942, dominant role can be withdrawn and authorize current non-primary node.Failover policy can be managed for example: willThe trigger condition for leading to main node state change, the health status for whether and how monitoring main node or non-primary node,There is the quantity etc. of non-primary node to be configured in given redundant group.In at least some embodiments, it can establish singleRG is used for multiple subregions, such as intake node redundancy group (RG) 905 can be responsible for handling taking the photograph for the record of subregion Sj-Pk and Sp-PqIt takes.In some implementations, the node for being designated for the main node of a subregion can be designated for separately simultaneouslyThe non-primary node of one subregion.In one embodiment, multiple nodes can be designated as main in given RG simultaneouslyNode, such as the relevant workload of intake of given subregion can be allocated in two main nodes, and one of node existsNon-primary node is designated as in the case where breaking down at any one main node.The number of the node instantiated in given RGAmount may depend on available needed for corresponding function (such as being able to bear in how many concurrent or overlapping failures in described group of intention)Property or restoration it is horizontal.In some embodiments, in addition to or instead of be used for SMS node, redundant group can be established useIn the working node of SPS processing stage.The component of given RG can be geographically distributed sometimes, such as in several dataThe heart, as shown in Figure 10.In some embodiments, the control node of selection can be configured to for example using heartbeat mechanism orOther health monitoring techniques detect the condition of failure transfer triggering, and this control node can be appropriate non-principal by selectingNode carrys out coordinate fault transfer as the alternative of the main node to failure, replacement node of notice/start selection etc..
In some embodiments, provider network can be organized into multiple geographic areas, and each region Ke BaoInclude the one or more availability containers that can also be referred to as " availability area " herein.Availability container transfers to may include oneOr multiple and different places or data center, the availability container are engineered in such a way (for example, passing through independenceBasic structural member, the relevant equipment of such as power, cooling equipment, physical security member): in given availability containerFault Isolation in resource and other availability containers.Failure in one availability container may not expected can at any otherFailure is generated in property container, therefore, the availability configuration file of resource instances or control server is intended to independently of in differenceAvailability container in resource instances or control server availability configuration file.It can be by holding in corresponding availabilityStart multiple Application Instances or (in the case where some SMS and SPS) in device to spread the node of given redundant groupMultiple availability container distributions at single place to protect various types of application programs from failure.Meanwhile in some realitiesIn existing mode, in the resource being present in identical geographic area (such as the host or calculated examples of SMS and SPS node)Between can provide cheap and low latency network connection, and the network transmission between the resource of identical availability container can be withEven faster.Some clients may want to for example, referring at zone level, the horizontal place of availability vessel level or data centerThe place of the fixed flow management for retaining and/or instantiating them or stream process resource, to maintain the various portions of their application programThe control for the required degree where part accurately runs.Other clients may be for retaining or instantiating their resourceAccurate place is less interested, as long as the resource for example meets client demand for performance, high availability etc..OneIn a little embodiments, the control node being located in an availability container (or data center) can remotely configure otherOther SMS or SPS nodes in availability container (or other data centers), that is to say, that specific availability container or numberIt can not need to manage SMS/SPS node with Partial controll node according to center.
Figure 10 shows the provider network environment according at least some embodiments, wherein the node of given redundant group can divideCloth is in multiple data centers.In the embodiment of description, provider network 1002 include three availability container 1003A,1003B and 1003C.Each availability container includes some or all of one or more data centers, such as availability container1003A includes data center 1005A and 1005B, and availability container 1003B includes data center 1005C and availability container1003C includes data center 1005D.Show many different redundant groups of SMS and/or SPS node.Some RG can be singleIt is all realized in data center, such as in the case where redundant group (RG) 1012A being located in data center 1005A.Other RG canUsing the resource of multiple data centers in given availability container, such as RG 1012B, the RG 1012B crosses over availabilityThe data center 1005A and 1005B of container 1003A.However the resource throughout different availability containers can be used for other RGIt realizes.For example, RG 1012C is respectively using in the data center 1005B and 1005C of availability container 1003A and 1003BResource, and RG 1012D be utilized respectively data center 1005B in availability container 1003A, 1003B and 1003C,Resource at 1005C and 1005D.In a kind of exemplary deployment, if RG includes a main node and two non-principal sectionsPoint, each of these three nodes can be located in different availability containers, thereby, it is ensured that at least one node is very likely toIts functionality is kept, even if extensive event of failure occurs simultaneously at two different availability containers.
In the embodiment of description, SMS console associated with SMS and SPS services 1078 and SPS console respectivelyService 1076 can provide wieldy network-based interface, and for configuring, the stream in provider network 1002 is relevant to be setIt sets.Can be used resource realized in provider network 1002 many other services (its it is at least some can be by SMS and/or SPSUsing), the resource is throughout one or more data centers or throughout one or more availability containers.Such as, it can be achieved thatVirtual computing service 1072, so that client can utilize the calculating for being packaged as various different ability levels of selection quantityThe computing capability of example, and this calculated examples can be used to realize SMS and/or SPS node.One or more storages can be achievedService 1070 so that client can for example be stored by block assembly volume interface or by network service interface andAccess the data object with required data endurance level.In some embodiments, storage object could attach to virtual meterIt calculates the calculated examples of service 1072 or can be accessed from it, and can be used to realize various streams at SMS storage subsystemPersistence strategy.In one embodiment, one or more database services of such as high-performance key assignments DBMS1074, orRelevant database service can be realized at provider network 1002, and this database service can be used to SMNS storageSubsystem stores flow data record, and/or for storing control subsystem, intake subsystem, storage subsystem, retrieval subsystemOr the metadata of processing stage.
Flow secure option
In at least some embodiments, the user of SMS and/or SPS can be provided for a variety of safety of data flowRelevant options, so that client can select the secure configuration file (such as virtual machine or physical machine) of resource, the moneySource is ready to use in various functional types, such as absorbs, stores, retrieval, handling and/or control.This option may include for example closingIn the type of the physical location of the resource for various nodes selection (such as, if need using provider network facility, orThe facility whether person has client to be used all, any facility can have the Special safety different from provider network facilitySign), about flow data encryption selection and/or in the various pieces of stream process foundation structure Network Isolation selection.OneA possibility that a little clients may worry effractor or attacker, the effractor or attacker get valuable proprietary quotientThe access of industry logic or algorithm, such as and may want to realize at stream using the computing device in all places of clientManage working node.Need to be can be described as herein for realizing the type of the resource of one group of SMS and/or SPS node for those" placing destination type " of node.Figure 11, which is shown, can be selected for SMS's or SPS according at least some embodimentsMultiple placement destinations type of node.
In the embodiment of description, placement destination can be selected for some type of in provider network 1002SMS/SPS functional type (for example, intake, storage, retrieval, control or processing), and it is being used for other kinds of SMS/SPS functionOutside the provider network 1002 of energy type.In provider network 1002, multi-tenant example host 1103 can be used to realize oneA little resources, such as calculated examples, storage example or database instance.This multi-tenant example host can be used for one or moreEach of the SMS or SPS node of a client place is instantiated, and can be formed and be placed destination type A.In order to avoidHave to share physical resource with other clients, some clients can request their SMS/SPS node use to be confined to listThe example host of a client is realized.This list tenant example host 1104, which can be formed, places destination type B.For severalReason, from the perspective of some clients, single tenant's example host 1104 may be preferred.Due to multi-tenant example hostIt may include the calculated examples for being subordinated to other clients, it can than in single tenant's example host 1104 in multi-tenant example hostThere can be the more high likelihood of the security attack of the example from another client.In addition, when using single tenant's example hostIt, can also be to avoid the calculated examples CI1 experience workload of one of client run on multi-tenant host when 1104The calculating cycle or other resources of the host of large scale are increased sharply and start to consume, therefore potentially influences another client" noisy neighbours " phenomenon of the performance of the application program run on different calculated examples CI2.
In the embodiment of description, the virtual network (IVN) (such as IVN 1106A and 1106B) of isolation, which can represent, is putSet destination Type C.In some embodiments, the request of provider network client can be answered to create IVN as dedicated networkLogic equivalent, but in the case where network configuration is just largely controlled by client can be used provider networkResource construction IVN.For example, client can determine there is IP address ready for use in IVN, repeat to exist without worryingA possibility that IP address used outside IVN.In the embodiment of description, various types are realized in one or more IVNSMS and SPS node can for client flow data management and/or processing increase extra level internet security.OneIn a little situations, given client may want to the SMS/SPS node that a functional type is placed in an IVN, and notThe SMS/SPS node of different function type is placed in same IVN.In various embodiments, given IVN may include that single tenant is realExample host 1104, multi-tenant example host or two kinds of example host.In some embodiments, using supplier's netAnother group of placement destination type selection (or secure configuration file selection) (being not shown in Figure 11) of the resource of network is at leastSome clients can be available.In client service can be calculated from the virtualization for flowing relevant operation of provider networkIt obtains and using in the embodiment of computing resource, the calculated examples can be used in one of both of which.OneIn kind mode, client can provide an executable program or multiple executable programs, the executable program for SPS or SMSHave and stay at the calculated examples for being configured as SPS working node (or at intake, storage or retrieval node) operation, and makesSMS or SPS operation described program and management node.This first mode can be referred to as using the calculated examples for flowing operation" stream service management " mode.In other modes, client be may want to from SPS or SMS less support the case whereLower operation executable program simultaneously manages calculated examples.This second mode can be referred to as using the calculated examples for flowing operation" client-side management " mode.Therefore both operation modes can be represented about the selectable placement destination type of clientOr the other selection of secure configuration file.If such as executable program is likely to require debugging (including single-step debug), visitorThe mode of client-side management may be selected in family end, and the debugging can best be executed by the subject matter expert of the tissue from client,And the mode for flowing service management can be reasonable selection for being less likely the more mature code that needs are debugged.In some realitiesIt applies in scheme, different price strategies can be applied to both modes.
In the embodiment shown in Figure 11, many placement options can be supported at the facility outside provider network.ExampleSuch as, the host 1160 that the library SMS 1171 and/or the library SPS 1172 are mounted on can be used for from client facility (such as visitorAll data centers in family end or place) 1110A or flow management or processing in 1110B, the client of two of them type setIt applies and is different in the mode that they are connected to provider network.Client facility 1110A is by by least some sharedInternet link 1151 is linked to provider network 1002, and (i.e. the network flow of other entities can also be in client facility 1110ASome between provider network 1002 chain flowing).In contrast, some client facilities (such as 1110B) can be withProvider network is linked to by unshared private link 1176 (sometimes can referred to as " being directly connected to " link).SchemingIn term used in 11, respectively includes placing destination type D at both different types of clients and place purposeGround type E.In some embodiments, the part of SMS and/or SPS third party's facility 1120 (such as using but not byThe client of SMS/SPS is all or the data center of management) at be also possible to achievable, and this third party place can be withIt is designated as placing destination type F.In at least some clients and/or third party place, the library SMS and/or SPS is possible mustIt must obtain, and be mounted on the host for being ready to use in SMS/SPS node from provider network.In at least one embodimentIn, the node of all different functional types can be realized outside provider network by means of library appropriate.
In different implementation scenarios, different placement destination types may differ from that in various security-related aspectsThis, the intrusion detection feature of the Network Isolation feature, support such as realized, the physical security strategy of realization, support encryption stageNot etc..Therefore, each in various destination types may be considered that with corresponding secure configuration file, and the safety is matchedOther secure configuration files for placing destination can be different from one or more ways by setting file.In Figure 12 a and Figure 12 bIt is shown, in some embodiments, the client of SMS and/or SPS can programmatically (such as by be sent to SMS orThe request of one or more control nodes of SPS) it is different subsystems or the corresponding placement destination type of node collection selection.It should be noted that client may want to control and place mesh in some embodiments and for certain form of streaming applicationGround type, this be not only for security reasons but also for performance and/or function reason.For example, can be by usingPrivate client place resource or single tenant's example host 1104 avoid noisy neighbours' phenomenon described above.In some implementationsIn scheme, client can have them to be desirably used for the dedicated or proprietary hardware and/or software of SPS stage or SMS node, whereinIt can not easily be replicated at provider network using the accessible Functional Capability of this component or performance level, or only existedIt is not supported at provider network.Client may access has supercomputer horizontal processing ability at external data centerComputer server, such as the computer server can with than be used alone provider network resource will likely obtainThe much higher rate of the rate obtained is handled to execute SPS.It enables the client to that placement destination is selected to allow for various nodesUse this dedicated unit or software.
Figure 12 a and Figure 12 b be shown respectively according at least some embodiments can be by SPS client and SMS clientThe example of the secure option request of submission.Figure 12 a shows SPS secure option request 1200, and wherein client instruction has the stageThe processing stage of ID 1210, SPS control node are placed in destination type (PDT) 1212 and SPS working node PDT 1214One or more.In at least one embodiment, client can be can submit request to think their flow data noteRecord or the configuration encryption setting of stream process result, such as through request before those data are recorded in and transmit in various network linkingsThey are encrypted using specified algorithm or agreement, or request encrypts the interaction of various controls or management.For example, the encryption setting for the stage can refer to the encryption for being shown with the result to be applied operated to phase process in Figure 12 aTechnology, and/or the encryption for the communication between the control node and the working node in the stage in the stage.
Similarly, in Figure 12 b, the SMS secure option request 1250 of client includes many elements, the element instructionClient is for having the safety preference of one or more streams of specified stream ID 1252.For intake node, memory node and inspectionThe placement destination type preference of socket point can be respectively in intake node PDT 1254, memory node PDT 1258 and retrieval nodeIt is indicated in PDT 1262.The PDT preference of intake control node, storage control node and retrieval control node can be distinguishedIt is indicated by intake control node PDT 1256, storage control node PDT 1260 and retrieval control node PDT 1264.It is rightIt can be indicated by encryption setting 1266 in the encryption preference of data record, such as when data record is by the node from a typeWhether and/or how realize and encrypts for it when being transmitted to the node of another type.By using in such as Figure 12 a and Figure 12 bThe secure option request those of shown, client can be (such as in provider network or outside provider networksPortion) the selection place, and for they flow management and processing environment different piece various other security configurations textPart component.
It should be noted that the selection that node places destination may be in addition to safety at least some embodimentsOther reasons provide.For example, for performance reason (such as " noisy neighbours " problem in order to avoid pointing out before, rather thanPrimarily for security reason), client may want to some type of SMS realized at single tenant's host or SPS sectionPoint.In at least some embodiments, place selection can change during the service life of stream, such as client can initially permitPerhaps SMS node instantiates at multi-tenant example host, but may want to move at least some subsets of the node laterTo single tenant's example host.In at least some embodiments, different price strategies can be applied to different safety-relatedOption, for example, realizing that the SMS node of specific function type may be more real than at the multi-tenant example host outside IVN at IVNThe SMS node cost of existing specific function type is higher, or realizes that SMS node may be than in more rents at single tenant's example hostIt is higher to realize that SMS node is spent at the example host of family.
Flow the sequential storage and retrieval of record
For the streaming application of many types, data record can be raw from multiple data with very high rate at SMSBusiness men receives, and data consumer may it is generally desirable to access the data record of storage to generate the sequence of the record.As previously mentioned, especially in the environment that spinning disk is used as the storage device recorded for flow data, the I/O of sequenceAccess module (for reading and writing) can have the significant performance advantage better than random I/O access module.In several implementationsIn scheme, the sequence number that stream is specified or subregion is specified can distribute to them when data record is received by SMS, and can supportOrdered retrieval operation based on sequence number.Figure 13 a show according at least some embodiments flow data manufacturer 120 withExemplary interaction between the intake subsystem 204 of SMS.Flow data manufacturer can submit data record 110 to intake subsystem,And in the embodiment of description, intake subsystem can reply the sequence number 102 for being selected for the record submitted.ExtremelyIn few some embodiments, intake node can obtain the part of the sequence number from storage subsystem, such as in this embodiment partySequence number can determine after the storage of received data record according to applicable persistence strategy in case, and store sonSystem can produce the sequence indicators of the number of oneself for the data record, and provide that indicator for being included inIt is distributed in the bigger sequence number of data record by intake node.
Sequence number can be realized in various embodiments to provide the stabilization of data record, consistent sequence, and is madeRepeatable iteration can be carried out on record by data consumer.In at least some implementations, particular zones are distributed toThe sequence number of data record can be increased monotonically at any time, although they need not be continuous.In various embodiments, sequenceRow number can be designated have at least some of following semanteme subset: (a) sequence number is unique in stream, that is, is not hadTwo data records of given stream can be assigned identical sequence number;(b) in the available data record for accomplishing stream of sequence numberIndex, and can be used to be iterated in the data record in given stream subregion;(c) for any data-oriented manufacturer,Data manufacturer successfully submits the sequence of data record to be reflected in the sequence number for distributing to data record;And it (d) is used forThe sequence number of data record with given partitioning key values entirely again dynamic partition operation on keep be increased monotonically semanteme,Such as it distributes to the sequence number of the data record with partitioning key values K1 and may each greater than distribute to after subregion again dynamicState has any sequence number of the data record of that partitioning key values K1 before subregion again.(below with reference to Figure 16 is furtherDescribe dynamic subregion again in detail.
In some embodiments, data manufacturer may want to the sequence for influencing to be selected at least some data recordsThe selection of row number.For example, data manufacturer may want to define boundary or separator in the sequence number of the distribution of stream, so thatThe read requests of the specific subset for stream are submitted to become easier to for the data consumer of the stream.In some realizationsIn mode, data manufacturer can submit minmal sequence number together with the instruction of record, and SMS can be selected according to the minimum value of requestSequence number is selected, it is semantic that the minimum value also complies with sequence number discussed above.
Figure 13 b shows the data record that intake can be generated at SMS according at least some embodimentsThe exemplary elements of sequence number.In the embodiment of description, sequence number may include four elements: n1 SMS version numbers 1302,N2 timestamp/epoch values 1304, the seat n3 sequence number 1306 and n4 partition numbers 1308.In some implementations, can makeWith 128 bit sequence numbers, such as n1, n2, n3 and n4 can be 4,44,64 and 16 respectively.Version number 1302 can be onlyConfusion for avoiding entire SMS software version from issuing, such as so that relatively easily inform any version of SMS softwareOriginally it is used to generate sequence number.In at least some implementations, version number 1302 may not expect to be frequently changed.It can be such asN2 timestamp/epoch values 1304 are obtained from local clock source or globally accessible clock source by intake subsystem node(for example, realizing the confession for obtaining current epoch (getCurrentEpoch) or obtaining current time (getCurrentTime) APIAnswer the system for managing state of quotient's network).In at least some implementations, from the offset at well-known time point (for example,It, can be by being based on Unix from the past number of seconds of 00:00:00AM UTC on January 1st, 1970TMOperating system in callThe system calling of various time correlations obtains) it can be used for n2 timestamp/epoch values 1304.In some embodiments, sequenceRow number 1036 can be generated by storage subsystem, and can indicate the sequence of the data record write storage device of particular zones.CauseThis is recorded in reception in the given second in many data and n2 timestamp/epoch values 1304 are only in interval variation in approximate one secondImplementation in, sequence number 1306 can be used as the indicator that (or storage) sequence is reached for the record of data record, describedData record just reaches within the identical second and is therefore assigned identical timestamp value.In some embodiments,Partition number 1308 can unique identification go out the subregion in given stream.It is corresponding in sequence number timestamp (at least approximately) instruction intakeIn at least some implementations of the clock time of data record, sequence number can be used for Indexing Mechanism, and the Indexing Mechanism is used forCertain form of time-based retrieval request.For example, client may want to retrieval in particular day or when specifiedBetween generate during range or the stream of intake record, and sequence number can be used as the key of implicit secondary index to retrieve and suitably organizeData record.Therefore, at least some embodiments, the sequence number comprising the timestamp for orderly storing and retrieving makesWith can have other benefit, that is, provide the time index in the data record of described group of storage.
Usually usually the data record of given subregion can be sequentially written in sequence number by using big sequence write operation(such as to disk).In some embodiments, as previously noted, it can be achieved that the programming interface based on iterator, to permitPerhaps data consumer is with sequence number sequence reads data log.Figure 14 is shown according at least some embodiments at SMSThe orderly example stored and retrieve of flow data record.Six DR 110A-DR of subregion Sj-Pk (k-th of subregion of stream Sj)110F is shown to store with sequence number sequence.As shown, sequence number can not be continuously at least some embodiments, such as may due to assigning values to the mode of n2 timestamp/epoch values 1304 or sequence number 1306 as discussed aboveAlways do not generate the continuous value for being used for those elements.
In the example being shown in FIG. 14, data consumer has requested that generation iteration by specified starting sequence number " 865 "Device.In response to the request, SMS has initialized iterator 1, and the iterator 1 is positioned in the number with nearest sequence numberAt center, the nearest sequence number is greater than or equal to requested starting sequence number.In this case, due to nextLower sequence number (860, be assigned to DR 110B) is less than the starting sequence number in the request of consumer, has sequence number 871DR 110C be selected as the initial position of iterator.It obtains iterator interface and is contemplated that the requested position in subregionThe logic equivalent of the request of place's setting pointer is set, and obtain next record interface to be subsequently used to from the pointer positionBeginning reads data log is set, such as pointer is moved along stream with sequence number sequence.In the example shown, data consumerIt has called and has obtained next record interface, wherein parameter " iterator " is arranged to iterator 1, and " maximum number record(maxNumRecord) " (maximum number of the data record of return) is arranged to 3.Therefore, SMS retrieval subsystem is by DR 110C, DR110D and DR 110E sequentially returns to data consumer with that.Iterator (iterator 1) is completed to obtain next record callingIt is movable to new position later, such as to DR 110F, and the next record of the subsequent acquisition for identical iterator is adjustedWith can return to DR 110F originate data record.In some embodiments, the semanteme of iterator calling is obtained in some realitiesApplying in scheme can be different, such as iterator can be positioned in the highest equal to or less than requested sequence numberAt the nearest data record of sequence number, rather than the iterator is located in have and is greater than or equal to assigned serial number mostAt the data record of nearly sequence number.In another embodiment, client may must specify in obtaining iterator callingExisting sequence number, for example, if being recorded in stream with requested sequence number is not present, then can return to mistake.
Subregion mapping
As described above, in various embodiments, to the intake of the record of given stream, store, retrieve and process it is relatedWorkload can be divided and distribute in several nodes according to various subregions and again partitioning strategies.Figure 15 is shown according at leastThe flow point area mapping 1501 of some embodiments and the example that the correspondence configuration decisions that SMS and SPS node is made can be directed to.WhenWhen creation or initialization specific data stream, such as the calling of the creation stream API in response to client, partitioning strategies, which can start, to be used forThe stream, the partitioning strategies may be used to determine subregion, and any data-oriented record of stream is considered the member of the subregion.It absorbs subsystem 204, storage subsystem 206, retrieval subsystem 208 and needs to be directed to any of data-oriented record execution operationThe specific node of related SPS processing stage can be selected on the basis of the subregion of record.In one embodiment, it is used forAt least one subset of the control node of data-oriented record may be based on subregion and be selected.In at least some embodimentsIn, the dynamic of support data stream part of the subregion as partitioning strategies again, such as in response to pointed in the strategyTrigger condition or in response to explicitly requesting.
In various embodiments, selection can be dependent on the subregion for the record for the subregion of data-oriented recordKey, the value of the subregion key can be by data manufacturers directly (such as write-in or place parameter of request) or indirectly(for example, SMS can be used metadata as subregion key, the metadata such as identifier of data manufacturer client or title,The part of the actual content of the IP address or data record of data manufacturer) supply.In the embodiment being shown in FIG. 15,One or more subregion mapping functions 1506 can be applied to DR subregion key/attribute 1502, to determine DR partition id 1510.OneIn kind implementation, for example, given DR partition id 1510 can represent the successive range on the space of 128 integer values, so thatThe union of the range for all subregions that must be used to flow can cover all possible value that 128 integers can assume that.Show thisIn example disposition border, a simple subregion mapping function 1506 can be generated from the partitioning key values of data record or selected attribute value128 hashed values, and partition identifier can be placed exactly in the specific successive range in it based on hashed value to determine.SomeIn implementation, successive range can at least original size it is equal;In other implementations, different subregions can correspond to possibilityWith the successive range to differ in size from one another.In one implementation, subregion also can produce the adjusting to range limit again.ItsHe can be used in different implementations partition functions.
If data flow undergoes dynamically subregion (as further discussed in detail) again, the record institute with particular keyThe subregion being mapped to can change.Therefore, at least some embodiments, SMS and/or SPS control node may must beRecord is applied to several different mappings of stream during the service life of stream.In some embodiments, such as optional timeStamp/SN validity range 1511 metadata can be stored by the control node mapped for each subregion.Optional timestamp/SNValidity range 1511 can for example indicate specifically to map M1 from the creation time of stream until time T1 is applied, and indicate differentMapping M2 is applied from T1 to T2.When the read requests in response to guiding at stream, retrieval node is possible must first reallySurely it will use any mapping (such as depending on sequence number indicated in read requests), and then be known using that mappingMemory node not appropriate.
In at least some embodiments, SMS and SPS control node can be responsible for mapping to subregion into several different interval rulersThe resource at very little place.For example, as shown in the example implementations 1599 of Figure 15, in one implementation, absorb, store,Retrieval or processing (work) node can respectively be embodied as the execution in server virtual machine corresponding process or corresponding thread,Such as JavaTMVirtual machine (JVM) or calculated examples, and JVM or calculated examples can respectively instantiate at specific physical host.In some embodiments, multiple JVM can start in single calculated examples, to increase another layer of resource impact decision.Therefore, for giving subregion, one or more control nodes may be selected that any specific resources will be used as intake node1515, memory node 1520, retrieval node 1525 or processing stage PS1 working node 1530A and processing stage PS2 working node1530B.Control node may further determine that those nodes to server (such as intake server 1535, storage server 1540, retrievalServer 1545 or processing server 1550A and processing server 1550B) mapping and server (such as absorbed with hostHost 1555, storage host 1560, retrieval host 1565 or handle host 1570A and handle host 1570B) between mapping.In some implementations, subregion mapping may be considered that including show each resource size of space (such as node, serviceDevice and the host size of space) each at identification information (for example, resource identifier), be used as a subregion mapping function orThe instruction of the data record attribute of the input of multiple subregion mapping functions 1506 and subregion mapping function 1506 itself.ControlServer can store the representative of the subregion mapping in metadata storage, and can expose various API in some embodiments(such as acquisition partition information (getPartitionInfo) API) or other programming interface, for data manufacturer, data consumptionPerson provides mapping information for the node of SMS subsystem or SPS.
Data be recorded the mapping of subregion and mapping from subregion to resource in some embodiments by it is various becauseElement may become more complicated, the factor such as: (a) in some embodiments, give node, server or host can be withIt is designated to be responsible for multiple subregions, or (b) is being assigned into given subregion or the new node of partition set, server or hostThere may be failure or other triggerings.In addition, the subregion for given stream reflects as hereinbefore pointed out and be described belowPenetrate dynamically to modify at any time, and flows record and continue by SMS and/or SPS node processing.Therefore, in some embodiments,The map metadata of several versions can temporarily, at least be preserved for given stream, and each version corresponds to the different periods.
Dynamic stream subregion again
Figure 16 shows the example according to the dynamic stream of at least some embodiments again subregion.The time being shown in FIG. 16At the time T1 of axis, creation or initialization flow S1.Subregion mapping PM1 is created for stream S1, and in time interval T1 to the T2 phaseBetween keep effective.By SMS, received three data records are shown by example between T1 and T2.DR 110A (DR110A) quiltThe partitioning key values " Alice " with client supply are submitted, DR 110B is submitted the partitioning key values with client supply" Bill ", and DR110C is submitted the partitioning key values " Charlie " with client supply.In initial mapping PM1, ownThree DR 110A, DR 110B and DR 110C are mapped to the identical partitions with partition identifier " P1 ".P1 data are rememberedRecord, individual node I1 are configured to handle intake, and individual node S1 is configured to handle storage, and individual node R1 is configured to locateReason retrieval, and single working node W1 is configured to handle SPS processing.When for mapping the starting of validity range of PM1Between stamp be configured to T1.
In the Exemplary temporal axis of Figure 16, at time T2, stream S1 is by dynamic again subregion.In the embodiment of descriptionIn, data record is continued to and is handled by SMS and SPS, and subregion again occurs without taking into account when;Either SMS or SPSOff line is not all needed.Again subregion can start due to any one of many factors, for example, in response to intake, storage,Retrieve or handle the detection of the overload at node, in response to the workload level at the different hosts of each subsystemBetween inclination or it is unbalanced detection or in response to the request from data consumer or data manufacturer's client.It is retouchingIn the embodiment drawn, new mapping PM2 (or T2 in the near future) at time T2 works, and is such as used for PM2 by showingValidity range initial time stamp setting it is indicated.In at least some implementations, difference group data record attribute is removedIt can be used for except use to data record partitioning before subregion again.In some cases, zone attribute in addition can (exampleSuch as at the request of SMS) it is submitted by data manufacturer, and in other cases, the other attribute can absorb node by SMSIt generates.This other attribute can be described as " adding salt " attribute, and can using the technology for the other attribute of subregion againReferred to as " add salt ".In a kind of example implementations, overload intake server can be to data manufacturer (such as to by countingThe SMS client library code executed according to manufacturer) instruction: for subregion again, provided other than previously used subregion keyRandomly selected lesser integer value.Raw partition key and the combination for the other integer for adding salt then can be used in differenceDistribution intake workload in group intake node.In some embodiments, it retrieves node and/or data consumer may necessary quiltIt informs about the other attribute for subregion again.In at least some implementations, this other attribute can be not used inAgain subregion.
In the embodiment being shown in FIG. 16, relative to the subregion for being selected for identical key before T2, newSubregion mapping generates the different subregions for being selected for received at least some data records after t 2.DR 110P T2 itAfter be submitted and be submitted after t 2 with partitioning key values " Alice ", DR 110Q with partitioning key values " Bill ", and DR110R is submitted after t 2 with partitioning key values " Charlie ".In the Exemplary contexts shown, mapped by using PM2,DR 110P is designated the member of subregion " P4 ", and DR 110Q is designated the member of subregion " P5 ", and DR 110R is designated subregionThe member of " P6 ".In the embodiment of description, none is shown as received exemplary data record after t 2 and is referred toIt is set to the member of previously used subregion " P1 ", on the contrary, completely new subregion can be used after subregion again.In some embodimentsIn, at least some previously used subregions can continue to use after subregion again.For each in new subregion P4, P5 and P6A, different nodes can be specified for intake, storage, retrieval and/or processing.For example, node I4, S4, R4 and W4 can be withIt is configured for subregion P4, node I5, S5, R5 and P5 can be configured for subregion P5 and node I6, S6, R6 and P6 canTo be configured for subregion P6.In some embodiments, such as before subregion again be used for this record, subregion again itIdentical memory node can be used for having particular zones key or attribute afterwards record but storage different in the nodePoint (for example, different disks, different disk partition or different SSD can be used after subregion again).
Again during at least some periods after subregion, retrieval request can continue to be retrieved for counting for dynamic at T2According to record, the data are handled before being recorded in again subregion by SMS intake and/or storage subsystem.In at least some situationsUnder, the data record of request may must be based on PM1 mapping to retrieve, and the PM1 is when being mapped in intake data recordEffectively.Therefore, as indicated in Figure 16, for the purpose of data retrieval, when PM1 and PM2 can continue one section after t 2Between use.In at least some implementations, data record can be deleted finally from stream when their agings, and old subregionMapping can also be abandoned finally, such as when all corresponding data records have been deleted itself.In some embodiments, instead ofIt is deleted (or before deletion), stream record can achieve (such as archival strategy based on client selection) and deposit to different groupsPlace or device are stored up, so that the subregion mapping used by SMS can be still that can be used to retrieve record after archive.ThisIn embodiment, the subregion mapping of such as PM1 and PM2 can retain, as long as they are needed support for archive storage deviceRetrieval request.In the implementation of some archives, can be used do not need retain flow point area mapping different search methods (such asNew index can create the data record for archive).In some embodiments, such as P2 before subregion againIt is using can a certain moment quilt after subregion again by the subregion for not re-booting to it but be written after subregion again" closing " is used to read, such as can provide the equivalent of " partition end of arrival " error message in response to retrieval request.
In some implementations, data-oriented stream can be divided into many (such as hundreds and thousands of) a subregions.ConsiderA kind of exemplary cases, wherein stream S1 is initially divided into 1000 subregions, P1, P2 ..., P1000.Corresponding to one pointIn the case where the overload in area, for example P7 is deleted, this initial mapping that P7 is recorded for changing data may be worth, but the mapping of other subregions haves no need to change.In one approach, can be created by division operation again two it is newSubregion P1001 and P1002.After subregion again received record can be mapped to after subregion again P1001 orP1002, therefore the workload of P7 is distributed in two subregions, the attribute of the record will be initially (i.e. on the basis of original mappingsOn) their member relation has been generated in P7.Such as the remaining subregion of P1-P6 and P8-P1000 can not need to modify.When the relatively small subset of only subregion by it is this subregion is influenced again when, at least some embodiments, can produce and depositStore up combined data structure, such as directed acyclic graph of subregion entry (or tree of subregion entry).Each entry can indicate subregionFunction output area and validity time range (period that partition information of entry is considered valid during that).UpperIn the example of text, it is assumed that the subregion again for being related to P7 executes at time T2, and flows S1 (and its initial mapping) at time T1Creation.Under this situation, the validity period for the entry about P7 will be " T1 to T2 ", for P1001's and P1002Validity period will be " T2 is forward ", and the validity period for remaining subregion will be " T1 is forward ".In at least some realitiesIn existing mode, the number of the memory or storage device for subregion map metadata can be caused using this combined data structureThe substantive of amount is reduced.In example above, discusses and subregion P7 is separated into two new subregions.In at least some realizationsIn mode, subregion can also be merged during subregion again, and for example, it receives relatively little of retrieval request or submits phaseTwo to few record neighbouring subregions are combinable at single subregion.For any time point, partition functions can be used and haveEffect property time range information expressly determines the subregion of data record institute subordinate.With the variation of time, combined data knotStructure can develop as more separate sections and/or execute merging, but possible for total space needed for subregion metadata (whenThe influence that how long separation once occurs and how many average subregion is segregated so depended on) insignificantly increase.Compared to itUnder, in different implementations, whenever subregion again occurs, the constant metadata of the whole group for stream can be replicated andIt is combined with the entry of the subregion for being influenced by subregion again.In a kind of implementation below, subregion is reflectedThe demand of the storage device and memory of penetrating metadata may be increased with faster rate, especially if old mapping is as aboveAt least a period of time may must be retained described in literary after subregion again.
Using the sequence number for including timestamp value (n2 timestamp/epoch values 1304 shown in such as Figure 13 b)In at least some embodiments, the sequence number transformation of specified type can be implemented to dynamic subregion again.Pass through example vacationIf the sequence number scheme based on timestamp for being similar to scheme shown in Figure 13 b is used for stream S1, wherein per second generate newlyTimestamp value is included in the sequence number.Supporting dynamic again at least some implementations of subregion, in addition to dynamicState is again before subregion except use, the sequence number dynamically distributed after subregion again can be all using different groups of timestampValue (is started) with corresponding to again the selected initial time stamp value of partition event.For example, if submitting (even if it comes into force)Dynamically used timestamp value is Tk at the time of subregion again, then any new sequence issued after described submitRow number may be needed to coming using timestamp value Tk+1.As sequence number value in the scheme used in Figure 13 b in the timeThe high-order position of timestamp value it is at least some in they are encoded, it is ensured that again partition event correspond to foregoing timestampBoundary, which can transfer to simplify, is identifying bookkeeping involved in mapping ready for use in response to retrieval request.Therefore, thisIn implementation, when receiving the retrieval request for specifying specific sequence number, can from the sequence number extraction time timestamp value,And whether should use mapping again subregion after, or whether should use reflecting before subregion again if can easily determineIt penetrates.If the timestamp value extracted is lower than the initial time stamp for being selected for again subregion, before subregion again can be usedMapping, and if the timestamp value extracted, which is equal to or higher than, is selected for again the initial time stamp value of subregion,Mapping after subregion again can be used.
Method for flow management and processing
Figure 17 is to may perform to support for data record intake and data note according to showing at least some embodimentsRecord the flow chart of the operating aspect of the respective sets programming interface of retrieval.As shown in step 1701, can for example from SMS client orData manufacturer's client receives the request for creation or initialization data stream.It is described to can determine that (step 1704) is ready to use inThe initial subregion mapping of stream, such as point for needing particular data record institute subordinate for identification can be identified based on partitioning strategiesThe function in area and there is the input parameter for being ready to use in the function.As previously mentioned, in various embodiments, the control of SMSComponent can be responsible for receiving and responding to stream request to create.Realize the side of stream creation and initialization (and other control plane operations)Formula can be different from an embodiment to another embodiment.In one embodiment, for example, control service can be establishedThe redundant group of device, and the main control server of the redundant group can be by generating in persistent storage place and storingMetadata appropriate (for example, initial subregion maps, initial intake, storage and retrieval node collection etc.) for new stream is comeResponse stream request to create.It can be generated by using the main control server of the metadata of storage to subsequent about describedThe response of the inquiry (such as request about the backend nodes for being responsible for given subregion from front end intake node) of stream.In SMSIn another implementation of control plane function, stream configuration metadata can be stored in the database, the database is by taking the photographTake, store or at least some nodes of retrieval subsystem be directly it is addressable.It is creating with after initialization flow, is usually existingIn the case where other interaction not with control unit, the data plane behaviour that such as record is submitted, stores and retrieved can be startedMake, and the data plane can be handled by the corresponding component of corresponding subsystem and operated.
In some embodiments, data manufacturer may need to submit the specific subregion key with write request, andIt in other embodiments, can (such as identity of data manufacturer be received from it based on metadata associated with write requestThe IP address of data record) determine the input for being ready to use in partition functions, or determined from the content of data record itselfThere is the input for being ready to use in partition functions.In at least one implementation, client is optionally supplied in data record submissionPartition identifier is answered, and in this implementation, it may not be necessary to partition functions in addition.
When the initial node collection (step 1707) for determining or being configured to intake, storage and retrieval functions for the streamWhen, it may be considered that many different factors.For example, subregion map itself (it, which can determine, is divided into how many a subregions for the stream,And the relatively expected size of the subregion), about expected uptake rate and/or retrieval rate information (if thisInformation is useful), for flow data record durability/persistence requirements and/or for the high-availability requirement of each subsystem(it can cause the foundation for being similar to the redundant group of those shown in Fig. 9 and Figure 10) can influence the number of the node of different sub-systemsAmount and placement.In addition, can indicate putting for various nodes (as shown in Figure 11, Figure 12 a and Figure 12 b) in clientIn the embodiment for setting destination type preference, this preference can also have the resource for being ready to use in SMS and/or SPS node in determinationIn work.In at least some embodiments, the node for being able to carry out intake, storage and/or search function can be established in advanceCorresponding pond, and the selection member in this pond can be distributed to the new stream of each of creation by control unit.In other implementationsIn scheme, at least in some cases, when creation or initialization flow, it may be necessary to instantiate new intake, storage or retrievalNode.
It, can be by being implemented to any group of of data record submission at the intake node in the embodiment of descriptionProgramming interface records (step 1710) to receive, including for example submitting interface online (includes wherein, submitting request by dataIn) and with reference to submitting interface (wherein, to provide address in submitting request, node or SMS memory node can be absorbed for example by SMSRetrieval data are requested from the submission using network service request or other interfaces).It can be directed to and mention in different implementation scenariosIt hands over each in the method for record to provide any amount of different types of programming interface, such as can support to apply accordinglyProgram Interfaces (API) can establish webpage or network address, it can be achieved that graphic user interface or can for submitting online to referenceDevelop command-line tool.In at least some embodiments, SMS can be the record assigned sequence number of each intake, such as indicateThe sequence of intake or storage record, and sequence number can be used by the retrieval request of data consumer.In retrieval subsystemIt unites at node, record retrieval request can be received by the programming Retrieval Interface of any group of realization, and can mention in responseFor the content (step 1713) of the data record of request.For non-sequential access, interface may include for example obtaining iterator (to be based onSequence number indicated in iterator calling is obtained to request to have iterator to be instantiated at the position selected in subregion) orPerson obtains the record (getRecordWithSequenceNumber) with sequence number (to obtain the number with assigned serial numberAccording to record).For sequential access, it can be achieved that such as obtain next record interface (since the current location of iterator orRequest multiple records in order since specified sequence number).In at least some embodiments, different Retrieval Interfaces can haveThere are the different charge rates being associated, such as can will be used for being arranged to by the charge rate of record lower than use for ordered retrievalIn the charge rate by record of non-sequential retrieval.In some embodiments, different submission interfaces can also have different metersRate, for example, it is higher by cost is recorded with reference to the comparable online submission of submission.
With the variation of time, control node or dedicated accounting server can be acquired in each of flow management serviceService index (the step 1716) for the different programming interface realized at subsystem.The index may include for example: difference programming connectsThe calling of mouth counts, (it may differ from the calling counting at least some interfaces to the sum of the record of intake or retrieval, describedInterface can such as be used to retrieve the next record interface of acquisition of multiple records by individually calling), intake or retrievalThe sum etc. of data.It is optionally at least partially based on service index and produces with the associated corresponding charge rate of programming interfaceThe raw charging volume for needing to be collected to the client or production and/or consumption for possessing the stream come the client for the data flow automatically(step 1719).In at least some embodiments, billing activities can be asynchronous, example relative to stream intake/search operaqtionAs can based on during this month collected index come the monthly charging phase at the end of generate charging.
Figure 18 a is to show the operation side that may perform to configure stream process (SPS) stage according at least some embodimentsThe flow chart in face., it can be achieved that programming interface is so that client can be permitted for flow data record configuration as shown in step 1801The multiprocessing stage.In order to configure moment, such as client can indicate there is the flow data record for staying in subregion in the stageThe processing operation of upper execution, for processing operation output allocation strategy and other parameters, data such as to be processed willThe identity of the inlet flow obtained from it.In some embodiments, the processing operation in SPS stage may need to be idempotent.InIn other embodiments, non-idempotent can also be supported to operate at least some stages.In some embodiments, if givenPending processing is non-idempotent at stage, is regularly removed by configuration work node for place outside some persistenceThe output of operation to record clear operation when relative to record sorted order to execute, and then match during restorationReplacement working node is set to recur the clear operation, client can still be able to obtain the relevant benefit of recovery of idempotence.In at least some embodiments, client can using parallel work-flow on flow data several different states and by withThe result in some stages of the inlet flow in other stages is acted on to configure directed acyclic graph (DAG) or other processing stagesFigure.In some embodiments, one or more temporary currents can be created between different phase rather than lasting stream, such as need notThe data record output from a stage persistence must be stored in front of being fed to different phase as input to depositOn storage device.
In some embodiments, it can be achieved that any amount of different recovery policy is used for SPS stage, including such as baseIn the recovery policy or recovery policy as possible of check point.In one embodiment, programming interface can be used to select in clientRecovery policy for the different SPS stages.At the stage using the recovery policy based on check point, working node can be configuredDiscontinuously store progress record or check point, thus instruction reach in the flow point area that they arrived what degree (for example,The sequence number of the record handled recently can be stored as to the indicator of the progress).Below with reference to described in Figure 19, thereforeAfter barrier, progress record then can be used during recovery operation.In recovery policy as possible, storage progress record is not needed, andAnd the replacement working node configured in response to failure can carry out simple process to it when receiving new data record.It is givingDetermine in SPS stage diagram or workflow, in some embodiments, different recovery policies can be applied to the different stages.
SPS control server can for example receive idempotent behaviour by one in programming interface indicated in step 1801Making the instruction of Op1, the idempotent operation Op1 has according to partitioning strategies PPol1 to be executed at the moment PS1 for staying in stream S1,Described in the result of processing need to be allocated (step 1804) according to output distribution descriptor DDesc1.Can for example based onVarious factors is to determine the quantity for the working node for needing to be configured to stage PS1 and for virtually or physically money needed for nodeSource (step 1807), the factor such as Ppol1, idempotent operate the complexity of Op1 and have the resource for being ready to use in working nodePerformance capability.
Then can instantiate with configuration work node (step 1810), such as selected virtually or physically machine Energy Resources ServiceProcess or thread.In a kind of simple implementation, for example, can one working node of initial allocation for S1 each divideArea.It can configure given working node to come: (a) receiving data record from the subset appropriate of the retrieval node of S1, (b) receivingData record on execute Op1, (c) optionally, such as which processed group of instruction stored based on the recovery policy for PS1Progress record/check point of partitioned record, and (d) output is transmitted to and (such as is used as by the destination of DDesc1 instructionIntermediate lasting stream or temporary current or the input for directly arriving other processing stages or storage system).It should be noted that at least in some realitiesIt applies in scheme, SPS processing may not necessarily generate any output transmitted elsewhere on the basis of advance.For example, some SPSApplication program can be used simply as the temporary resource library of data record, and/or may be implemented to allow users to check that data are rememberedThe query interface of record.This application program can manage the output of its own, such as may be in response to received inquiry and not basisDistribution descriptor exports to generate.Recording relevant SPS application program can keep collected most from large scale distributed systemLog recording one day after, such as enable a client to check record data for the purpose of debugging or analysis.Therefore, InIn some embodiments, does not need output distribution descriptor specifying at least some stages for being used for SPS, is used at least some streamsOr it is used at least some subregions.Working node then can start to retrieve and process data note according to their corresponding configuration settingsRecord (step 1813).In at least some embodiments, SPS control node can (such as the responsiveness using such as heart-beat protocolCheck) health status and various other indexs of monitoring node, such as money in the Energy Resources Service for being used for working nodeSource utilizes horizontal (step 1816).For example, if working node should be replaced and realize recovery policy as described below, from workThe information for making node acquisition can be used to determine whether that failure is needed to shift.
In some embodiments, installable SPS client library is provided to hope at all places of clientAnd/or the Energy Resources Service of the client selection in provider network realizes those of SPS working node client.Client library may be used alsoAllow SPS client that them is selected to be desirable for the degree of the various control plane characteristics for the service that SPS is managed, such as health prisonBrake, automatic workload monitoring and equilibrium, safety management, dynamically subregion etc. again.Figure 18 b is according at least some embodiment partyCase is shown in response to the executable operating aspect of the component invocation of the client library of the configuration for stream process working nodeFlow chart.As shown in step 1851, it is possible to provide SPS client library (such as by from be configured to execute Figure 18 a shown inThe network address downloading of the service of the multi-tenant SPS management of various operations).The library may include many executable components and/or can chainIt is connected to the component of client application.Some library components aloow client to select SPS management service, manage to SPSService registration or the required characteristic for specifying various working nodes have pending one or more SPS rank at the working nodeThe stream process operation of section.For example, a client may want to using by the provider network of working node it is virtual based onThe calculated examples collection of themselves that service center realizes is calculated, and another client may want to using positioned at client oneselfFor the computing device (dedicated unit that do not supported by provider network such as) at the data center of processing stream record.ClientIt can be serviced online or if necessary using virtual computing at the place of themselves on the basis of if necessaryCalculated examples bring working node.In addition to or replace working node this on-demand instantiation, in some embodimentsIn, client can be pre-configured with the potential reusable working node pond that can be disposed when needed.In some implementations, may be usedService registration of the library component to allow client to manage to SPS is executed or called, can be handled by SPS management service by clientThe subsequent control plane operations that end is instantiated as the working node of specified phases are used for its specific process or thread.In a realityApply in scheme, client can with can from need by for working node SPS management service handle different stage controlIt is selected in plane responsibility processed, for example, a client may want to carry out monitoring using the customized module of themselvesNode health, and another client may want to be used for using SPS management service monitoring node health and ifIt detects failure and takes appropriate action.
SPS management service can receive the instruction (step 1854) that particular clients are desirable for client library, the clientEnd library is used to configure the working node and/or control plane operations of specific SPS stage PS1.(PS1 itself can be used in the libraryIncluded programming interface is similar to shown in Fig. 4 to design, or using what is exposed as the SPS service managed based on netThe programming interface of the interface of network designs).Client can also indicate that stream, and the data of the stream have to be retrieved to be passed through for being used asThe input of PS1.Optionally, at least some embodiments, client can indicate that the control plane for PS1 is arranged, such asWhether client is desirable for the health monitoring ability of the service for node, or whether is ready the health monitoring using customizationTool (step 1857).Depending on the preference as indicated by client, it may be determined that need the SMS used for being configured to clientAnd/or one or more node (steps 1860) of SPS.It can be established between the working node of client and arrive SMS/SPS nodeNetwork connectivity and/or executable other configurations be operable so that data record stream and processing can be obtained as neededAs a result.When receiving retrieval request, data record can be supplied to SP1 working node, and can be executed as needed requiredControl plane operations (if having by client request) (step 1863).It should be noted that at least in some embodiments,It can or alternatively realize similar method, the method, which enables a client to control them, is desirable for SMS management clothesThe degree of the control plane function of each subsystem of business.
Figure 19 is to may perform to realize for the one or more extensive of stream process according to showing at least some embodimentsThe flow chart of the operating aspect of multiple strategy.As shown in step 1901, SPS control node can determine met it is specific for replacingThe trigger criteria of working node, such as working node may become without response or unsound, the workload of present nodeLevel may have reached the threshold value of failure transfer, and the quantity of the mistake detected at working node may be more than threshold value, orPerson can recognize some other unexpected states of working node.It can recognize or instantiate the working node (step of replacement1904).In some embodiments, the pond that can establish available work thread is used as from a wherein optional worker thread and replacesIt changes object, such as new thread or process can be started.
If there is recovery as possible to be used at SPS stage (particular job node is effective at the SPS stage)Strategy (determined by such as step 1907), the working node of replacement can only start when other data record is made available by(step 1916) is handled to them, such as the record of progress for the working node that do not replace needs to check.If wait makeWith the recovery policy based on check point, it is possible to provide instruction (such as storage device address or URL) (step 1910) in place, in instituteIt states and replaces the addressable progress record stored by the working node replaced of working node at place.Replacement working node can be retrievedThe nearest progress record stored by the node replaced, and described group of data record (step is determined using the progress recordIt is rapid 1913), in described group of data record, replacement working node should execute the stage idempotent operation.It is based on thisIn the recovery policy of check point, depending on continuing between last progress record and the time of instantiation replacement working nodeTime, and the rate of working node processed other record after the progress record stored depending on replacement,The data record of some quantity can be handled more than once.In at least some embodiments, if the operation being carrying out is powerDeng, then this repetitive operation can not have negative effect.In replacement working node based on the progress note stored beforeAfter record executes repetition recovery operation, at least some embodiments, the instruction that replacement working node can store its own is completeAt the progress record of recovery, and normal worker thread operation (step can be started in newest received data record1916)。
Figure 20 is to may perform to realize a variety of secure options for being used for data flow according to showing at least some embodimentsOperating aspect flow chart., it can be achieved that one or more programming interface, one or more of volumes as shown in step 2001Journey interface enables a client to be selected from the various secure options for data stream management and processing, the option packetInclude the placement destination for example for the node of different function type (for example, intake, storage, retrieval, processing or control node)Type option.Place destination type may differ from each other in the various aspects of their secure configuration file.In some realitiesIt applies in scheme, has the physical location for the resource for being ready to use in SMS or SPS node can be from a kind of destination type to another destinationType and it is different.For example, the resource of the example host such as at provider network data center can be used for the node, orPerson may be used at resource or usable third party's resource at all facilities of client.In at least some embodiments,Network Isolation rank or other network characterizations can be different from a kind of destination type to another destination type, such as can beThe some SMS or SPS nodes of instantiation in the virtual network of isolation, or pass through dedicated isolation at all facilities of clientSome SMS or SPS nodes are connected to provider network by physical link.In one embodiment, client can indicate supplyingIt answers and needs to be established certain form of SMS or SPS node at single tenant's example host of quotient's network, rather than be also possible to can for useMulti-tenant example host is established.In at least some embodiments, various types of Encryption Options pass through safety-relatedProgramming interface be also possible to it is selectable.
Can be received by safety-related programming interface client about the one or more functions kind for flowing S1The secure configuration file of the node of class selects or preference.For example, client is alternatively used for one of the node of functional type FC1Secure configuration file (such as client may want at all places of client realize SPS working node) and for differenceThe different secure configuration files of the node of functional type FC2 are (for example, client may be ready in provider network data centerRealize SMS intake node or memory node in place) (step 2004).In some cases, client can determine to establish with identicalSecure configuration file all different function types node.In some embodiments, SMS and/or SPS can be limited and is directed toThe placement destination type of the default of various functional types, for example, unless in addition client indicates the isolation in provider networkVirtual network in can establish the nodes of all functional types.
Then can based on client for secure configuration file and/or place preference (or based on for client notThe default setting of the functional type of preference is provided it) configure the node (step 2007) of different function type.The configurationCan be related to for example select physical host appropriate or physical machine, and instantiate by the node of different function type it is appropriate based onExample, virtual machine, process and/or thread are calculated, and establishes network connection appropriate among the nodes.In some embodiments,It can provide for the executable library component (part as the configuration) of different flow managements and processing function for supplyingIt answers and is installed at the host of quotient's network-external.
It, can such as encryption preference according to expressed by client or based on default according at least some embodimentsEncryption is arranged and starts encrypting module (step 2010) at the node of one or more types.It can be with the various functions of subsequent start-upThe node of type, so that such as desirably absorbing, storing, retrieve and/or handling flow data (step 2013) by client.
Figure 21 is to show the behaviour that may perform to realize the partitioning strategies for data flow according at least some embodimentsMake the flow chart of aspect.As shown in step 2101, partitioning strategies can be determined for data flow.The strategy may include such as dataThe initial mapping of subregion, data note of the initial mapping based on the key supplied by data manufacturer or based on submission is recordedEach attribute recorded and one or more trigger criterias for partition data stream again.In some embodiments, for example,Hash function can be applied to a subregion key or multiple subregion keys, to generate the hashed value of 128 integers.Can will likelyThe range of 128 integers is divided into N number of continuous subinterval, and each subinterval represents one in N number of subregion of stream.SomeIn embodiment, the quantity of subregion and/or the relative size in subinterval can flow to from one another stream and be changed.At least oneIn a little embodiments, the client for configuring stream for its interests be can provide about the input for having partition scheme ready for use, such asThe quantity of required subregion or the required feature for having partition functions ready for use.In at least one embodiment, client canFor the data record of submission some subsets or all the data record submitted provides partition identifier or title.
When the data record of receiving stream, their corresponding subregions can be determined based on the key of supply and/or other attributes, andAnd the intake suitably organized, the subregion (step 2104) of storage and retrieval node for identification may be selected.In at least some embodiment partyIn case, corresponding sequence number can be generated for data record, such as instruction receives the sequence (step 2107) of the record of given subregion.In some implementations, sequence number may include many elements, such as timestamp value (for example, from such as 1 day 00 January in 1970:The past number of seconds of known epoch of 00:00UTC), from storage subsystem obtain subsequence value, SMS software version number and/Or partition identifier.In some embodiments, sequence number can be supplied to data manufacturer, such as to confirm the data submittedThe successful intake of record.In some embodiments, sequence number can also be used by data consumer, come to absorb ordered retrieval streamOr the data record of subregion.
In at least some embodiments, data record can be stored at memory node with sequence number sequence, be based on dividingArea's strategy guides data record to the memory node (step 2110).In the embodiment using rotating disk storage deviceIn, received data record can be saved to disk usually using being sequentially written in, to avoid disk Seek latency time.InIn at least some implementations, non-volatile cache can be used as write cache, example before by record storage to diskSuch as to be further reduced disk tracking a possibility that.Request in response to the multiple data records to sort to reading according to sequence number(for example, the calling for obtaining next record or similar interface), then can use sequence to read from storage device reads data log(step 2113).
Figure 22 is to may perform to realize the behaviour of the dynamic subregion again of data flow according to showing at least some embodimentsMake the flow chart of aspect.As shown in step 2201, (for example, at control unit of SMS or SPS) can make stream have it is pendingThe dynamically determination of subregion again.Many different trigger conditions can produce the decision of subregion stream again, such as intake, storage,The detection of overload at the one or more of retrieval, processing or control node, or in the workload level of different nodesUnbalanced detection, or can be from the request of client (such as data manufacturer or data consumer) received subregion again.In some implementations, the request of client subregion again may include the detail of the subregion again of request, such as needThe various parameters of the mapping of the modification of generation (for example, there is the quantity etc. of to be added or removal subregion, should combine or separate instituteState specified partition).In one implementation, subregion request can indicate that client wishes to solve the problems, such as state to client again(such as load imbalance), and SMS or SPS can be responsible for for the description of problem state being converted to division operation again appropriate.In some cases, instead of requesting subregion again or description problem state, client to may specify the touching for being ready to use in again subregionIssue of bidding documents is quasi-.In some embodiments, the determination of the change of the persistent data demand of data flow can trigger subregion again, this canSuch as generate the different group storage devices for flowing record or the selection of different memory technologies.In some cases, data flowThe detection of change of use pattern (for example, rate of production or consumption data record) can also cause subregion again, and alsoIt can cause the use of the different memory technology for being particularly suited for the use pattern changed or different group storage devices.For example, weightDetermining for new subregion can be based on to expectation, for giving subregion or all determinations of the rate read and write for flowing, SSD can be withIt is the memory technology being more suitable for than spinning disk.In one embodiment, arrangement or the software and/or hardware that will generateVersion, which changes, can trigger subregion again.In some cases, when client is indicated by using different partition method orWhen the budget limit that different storage methods can more effectively meet, price or billing issues can trigger subregion again.ExtremelyIn few some embodiments, the performance objective of change also can trigger subregion again.It is optional in the embodiment described in Figure 22Select be ready to use in the sequence number distributed after subregion again initial timestamp value (such as from 00:00 on January 1st, 1970:The offset of the second of 00UTC passes through the typically available epoch value of system calling in several operating systems) (step 2204).InIn some implementations, the global state manager realized at provider network can be supported to obtain epoch value(getEpochValue) API, such as so that the various parts of SMS and/or SPS, which can obtain, is ready to use in sequence number productionRaw consistent timestamp value.In other implementations, other times source can be used, such as may specify SMS or SPS control sectionPoint calls come the timestamp value consistently to sort to other component offer or the calling of usable local system.In some implementationsIn scheme, timestamp value necessarily corresponds to wallclock timestamp at any particular host, such as can be simply using monotone increasingInteger counter value.
(step can be mapped for the subregion of the modification of the raw mapping for being different from using when subregion again determines of the miscarriage2207).In at least some embodiments, before subregion again, the mapping of change can be by the data with particular zones keyRecord maps to the subregion different from the subregion that will there is the data record of same keys to map to.It may depend on for subregion againTrigger condition and/or separate some subregions (subregion usually largely used) depending on the work figureofmerit abided by, and canMerge other and (usually uses) subregion on a small quantity.In some embodiments, can be used after subregion again with subregion again itPreceding different partition functions, such as different hash functions, or the difference that hash function result is divided to Composition Region can be usedMethod.In some implementations that such as subregion corresponds to the successive range of 128 integers, it can be incited somebody to action after subregion again128 integer spaces are divided into different groups of subinterval.It, can be by intake, storage, retrieval, place at least some embodimentsThe new group of reason or control node distributes to newly created subregion.It in some implementations, can be by the number of space efficient combinationIt is used to represent the mapping (step 2208) of initial mapping and modification according to structure.For example, directed acyclic graph or tree structure can be stored,Wherein each entry includes partition functions output area (for example, corresponding to the model of the result of the subregion hash function of given subregionEnclose) and validity time range instruction, such that due to again subregion it is only necessary to change correspond to modification subregion noteRecord.The entry for subregion remained unchanged during subregion again can not need to modify in data structure.It can matchNew node is set to realize that the subregion of modification maps (step 2210).In at least some embodiments, due at least one sectionCan continue to that the retrieval request of data record stored on the basis of mapping before can be retained in time previous node andPrevious mapping is for a period of time.(the step 2213) when receiving the read requests for specifying specific sequence number or timestamp, canIt is made with (for example, at control node or at retrieval node) about whether read requests need by using new subregionThe determination that mapping or previous subregion map to meet.It then can be needed to identify from its acquisition request using selected mappingThe memory node appropriate of data.
Figure 23 is to may perform to realize for data flow record at least once according to showing at least some embodimentsThe flow chart of the operating aspect of record intake strategy.As shown in step 2301, it can be achieved that one or more programming interface so thatObtaining client selection can absorb tactful, the record intake strategy for the record of data flow from several intake policing optionsOption includes such as (a) tactful at least once, and according to the strategy at least once, record submitter will submit one or manylyIntake is tactful as possible until receiving affirmative acknowledgment, or (b) for record, strategy is absorbed as possible according to described, not at least oneA little records, which are submitted, provides confirmation.Some data production clients may worry them unlike other data produce clientRecord sub-fraction potential loss, and may therefore select acquisition method as possible.In some implementations, even ifFor being configured to the stream absorbed as possible, SMS can still provide the confirmation of some subsets for data record, or may be veryTo confirmation of the offer for all data records is attempted, even if confirmation of the strategy without need for each data record as possible.
Request can be received by a programming interface, the request instruction has the specific intake strategy for being ready to use in specified stream(step 2304).Intake node (step 2307) can be instantiated according to the effective intake strategy of the stream.It is saved when in intake(step 2310) when receiving one or more submissions of identical data record at point may depend on effective intake strategy to adoptTake different movements.If intake strategy (in such as step 2313 determined by) at least once is used, can for one orEach of multiple submissions send an acknowledgment to data manufacturer, but a data note can be only saved at storage subsystemIt records (2316).(it should be noted that N number of pair of given record can be stored in some cases according to for flowing effective persistence strategyThis, but may be only that a submission generates copy if data-oriented is submitted to record M times, that is, the transcript storedSum will be still N, rather than NxM.) if strategy is absorbed as possible effectively (as also detected in step 2313), it can storeStill a data record is saved at device, but does not need to send an acknowledgment to data manufacturer's (step 2319).At leastIn some embodiments, selected intake strategy can be at least partially based on optionally to determine client charging volume (step2322).As previously noted, in some embodiments, the intake strategy at least once of two versions can be supported.In a versionIn this, it is similar to that version shown in Figure 23, SMS can be responsible for deduplication data record (i.e., it is ensured that only in response to one group twoIt is a or more submit in one and store data at SMS storage subsystem).In the intake at least once of different editionsIn, allow the repetition of the data record by SMS.Later approach may be useful for streaming application, wherein depositingSeldom or there is no the duplicate negative consequences of data record, and/or the stream application for executing the repeated elimination of themselvesProgram may be useful.
Figure 24 is to may perform to realize a variety of persistence plans for being used for data flow according to showing at least some embodimentsThe flow chart of operating aspect slightly., it can be achieved that enabling a client to from multiple persistence strategies as shown in step 2401One or more programming interface of the selection for the persistence strategy of flow data record.Persistence strategy can appoint in all fieldsWhat one upper different from each other: the quantity of copy such as (a) to be saved can be different (for example, N number of copy can be supported to 2Copy is to single replication policy), (b) storage location/type of device ready for use can be different (for example, spinning disk is to SSD pairsRAM is to database service or multi-tenant storage device) and/or it is (c) described tactful in the restorative expection to extensive failureIt can difference (for example, multiple data centers can be supported to forms data Central Policy) in degree.It can receive request, the request instructionSelection (the step 2404) of the specific persistence strategy for specified stream of client.In some embodiments, by clientThe persistence strategy of selection can cause the different storage location types or the use of type of device of the respective partition for given stream.In one embodiment, SMS (rather than client) can select storage location class in stream grade or in subregion gradeType or type of device.In some embodiments, when selecting persistence strategy in some embodiments, client be can refer toShow data endurance target and/or performance objective (all readings as required or write-in output or delay), and these targetsStorage arrangement type appropriate or place can be selected by SMS use.For example, SSD generation can be used if necessary to less delayThe data record of one or more subregions or stream is stored for spinning disk.
One group of intake node be can determine or configured to receive the data record of selected stream from data manufacturer, and can be matchedOne group of memory node is set to realize selected persistence strategy (step 2407).When receiving data record at intake node(step 2410) can be recorded at selected storage device by memory node storing data based on selected persistence strategyOne or more copies, the memory node are responsible for the subregion (step 2413) of data record institute subordinate.In at least some realizationsIn mode, charging volume (step can optionally (and/or asynchronously) be determined based on the specified persistence strategy selected by client2416)。
The work load management of dispersion for stream process
In some embodiments, it can for example be realized in decentralized manner by giving the working node in the SPS stageA big chunk of the control plane function of SPS is all, and the given SPS stage is shared by such as database tableData structure operates to coordinate various controls (such as to the subregion distribution of working node, response dynamics again subregion, health monitoringAnd/or load balancing).Given working node W1 can check the entry in shared data structure, with determine it is for example current notHandle the inlet flow (if yes) of which subregion in the stage.If it find that this subregion P1, W1 are renewable sharedEntry in data structure, to indicate that W1 will execute the processing operation in the stage on the record of P1.Other working nodes canKnow W1 be assigned handle P1 record, and may therefore distribute different subregions to themselves.Working node can be regularOr inquiry is submitted to SMS control plane once in a while, to determine that the currently valid subregion for inlet flow maps, and when necessary moreNewly shared data structure is to indicate that mapping changes (such as due to again subregion).In various embodiments, load balancing andOther operations can also be coordinated by shared data structure, as described below.It, can in the implementation of some this dispersionsDedicated control node can not needed for SPS, thus expense needed for reducing realization SPS workflow.This dispersionSPS controls planar implementations may be especially by the welcome for the consumer for being concerned about budget, and the consumer utilizes SPS client libraryCome real at the calculated examples for example in the provider network for distributing to consumer or at the place outside provider networkThe various aspects of existing stream process.Such as when all resources for SMS and SPS are configured in provider network, dispersionSPS control plane technology can be also used in the embodiment of unused client library.Working node is realized at which for extremelyThe SPS of some or all of SPS control plane functions of few some processing stages can referred to herein as " decentralised controlSPS”。
The example that Figure 25 shows the stream processing system according at least some embodiments, the wherein working node of processing stageCoordinate their workload using database table.In decentralised control SPS 2590, define two processing stage 215A and215B, each processing stage have respective sets working node.Processing stage 215A includes working node 2540A and 2540B, and is locatedReason stage 215B includes working node 2540K and 2540L.For each of processing stage 215A and 215B, in databaseIt services and creates corresponding subregion allocation table at 2520, such as the subregion allocation table 2550A of processing stage 215A, for locatingThe subregion allocation table 2550B of reason stage 215B.In some embodiments, such as in response to client library component or the tune of functionWith subregion allocation table for giving the stage can be created during the stage initializes.Each subregion allocation table, which can be inserted, represents instituteState the entry of the unallocated subregion of the inlet flow in stage or initial set (point that i.e. no working node is currently allocated to of rowArea).The exemplary column or attribute of PA table clause are shown in FIG. 26 and are described below.Starting is for the stageWorking node (for example, process or thread for starting at calculated examples or other servers) can be imparted into the stageThe read/write of PA table accesses.Reading and writing in Figure 25 by being respectively used to work for PA table from working nodeArrow 2564A, 2564B, 2564K and 2564L of node 2540A, 2540B, 2540K and 2540L is indicated.
Given working node can be configured to by checking that the entry in PA table selects to execute the rank on itThe particular zones of the processing operation of section.In one implementation, working node 2540A can be scanned in subregion allocation table 2550AEntry, until it finds the entry of unappropriated subregion Pk, and can attempt by updating the entry for subregion Pk pointsDispensing its own, such as by the way that the identifier of working node is inserted into one of entry column.This insertion is believed thatSubregion is locked similar to by working node.Depending on the type of database service currently in use, can be used (for example, logicalCross two or more working nodes that the just almost the same time identifies unappropriated subregion) management is to PA table clauseThe distinct methods of potential concurrent write-in.
In one embodiment, the irrelevant multi-tenant database service of provider network, more rents can be usedStrong consistency and conditionity are supported in the service of user data library in the case where necessarily supporting relevant db transaction semanticWrite operation.In this case, it can be updated by working node use condition write operation.If distributed to without working nodeSubregion then considers wherein to arrange the mark that " working node ID " is used to refer to distribute to the particular job node of subregion in PA tableThe example of symbol, and the value of the column is configured to " null value ".Under this situation, the working node with identifier WID1 canRequest logic equivalent below: " if working node ID is null value in the entry for subregion Pk, then that will be used forThe working node ID of a entry is arranged to WID1 ".If this conditionity write request success, the work with identifier WID1It can be assumed that subregion Pk is assigned to it as node.Working node then can start for example to examine using the record of retrieval subsystem 208Rope interface retrieves the data record of subregion Pk, such as by arrow (such as be respectively used to working node 2540A, 2540B, 2540K andArrow 2554A, 2554B, 2554K and 2554L of 2540L) indicated by, and processing operation is executed on the record of retrieval.Such asThe write-in failure of fruit conditionity, working node can restart to search for different unappropriated subregions.In other embodiments, may be usedUsing the database service (such as relevant database) for supporting affairs, and transaction functionality can be used to realize conditionity write operationEquivalent, such as to ensure only to distribute to subregion one in the trials of multiple concurrent (or close to concurrent) of working nodeA success, and the working node involved in this concurrently trial is reliably notified their success or failure.OneIn a little embodiments, it can be used and the simultaneous techniques supported also not dependent on affairs both was written independent of conditionity.In some realitiesIn existing mode, database service can not used;Exclusiveness access can be obtained using locked service by working node on the contrary, usedIn the update to the entry in the persistent data structure for being similar to PA table.
Other working nodes can check the entry in PA table, determine which subregion is unappropriated, and can finally succeedGround one or more subregions are distributed to oneself.In this way, for an inlet flow in the stage or multiple inlet flowsThe workload of processing of subregion can be finally allocated in them by the working node in the stage.
The initial subregion mapping of any given stream can change over time, such as since the dynamic described before is divided againArea operates and changes.Therefore, in the embodiment described in Figure 25, one or more of working node can once in a while (orIn response to trigger condition as described below) it submits and requests to the SMS control subsystem 210 of the inlet flow in their stage, withObtain current subregion metadata.In some implementations, this request may include the calling of SMS control plane API, such asBy the calling of the acquisition stream information API of arrow 2544A, 2544B, 2544K and 2544L instruction.SMS control subsystem can be returned for exampleThe newest list and/or other details of the subregion of the multiple stream, the validity period of such as subregion.If controlling son by SMSThe partition information that system 210 provides mismatches the entry in PA table, then PA table can be modified by working node, such as passes throughEntry is inserted into or deleted for one or more subregions to modify.In at least some embodiments, SMS control subsystem is arrivedThis request can usually than record retrieval request (and/or database read or write operation) frequency it is much lower, such as by arrowIndicated by the label " infrequently " of 2554A.Once retrieval and place can be generally remained for example, working node is assigned subregionThe data record of that subregion is managed until partition data is consumed completely (for example, if described in owner's closing of the streamStream, perhaps if due to dynamically again subregion and close the subregion) or the situation until encountering some other low possibilities(for example, as discussed below, if different working nodes due to the load imbalance that detects and request partition turnsIt moves).Therefore, in various embodiments, expense associated with calling acquisition stream information or similar API may be usually phaseWhen small, even if being provided with a large amount of information (if hundreds and thousands of a subregions are defined in response to any given callingThe inlet flow in stage, it would be possible that can be such case).
In the embodiment described in Figure 25, the work load management operation of some keys of decentralised control SPS environment canTherefore it is summarized as follows: access database table (a) being at least partially based on by first working node in stream process stage to select to flowThe particular zones of the input traffic of processing stage are realized on the stream process stage and are limited at one group of that stageReason operation;(b) indicator by the distribution of particular zones to the first working node is written in the particular items being stored in table;(c) it is used by the first working node in the programmed recording Retrieval Interface retrieval particular zones that multi-tenant flow management service center realizesRecord;(d) described group of processing operation is realized on the record of particular zones by the first working node;(e) by the second working nodeThe particular items being at least partially based in particular database table determine the first working node of distribution to execute institute on particular zonesState a group processing operation;And different subregions (f) is selected by the second working node, described group is executed on the different subregionProcessing operation.And if be retained in the subregion for being assigned to it when working node is determined without more record, workMetadata on the inlet flow from SMS control subsystem can be requested by making node, and can if metadata indicates differenceUpdate PA table.
Figure 26, which is shown, is storable in the subregion allocation table 2550 coordinated for workload according at least some embodimentsIn exemplary entries.As shown, subregion allocation table 2550 may include four column: the column of partition id 2614, assignment nodeThe column of ID 2618, the column of working node health indicator 2620 and workload level indicator 2622 arrange.In other implementationsIn, it can be achieved that other column setting, such as can be used instruction partition creating time or sectoring function defeated in some embodimentsIt is worth the column of range out, or can be arranged without using workload level indicator.
It should be noted that in some embodiments, partition list 2650 at SMS control subsystem (such as subregion itemThe part of the data structures of mesh tree, figure or other combinations described before) it at least at some time points may include than quiltIt is included in more subregions in subregion allocation table 2550.Partition list 2650 in the example of description, at SMS control subsystemIncluding subregion P1, P2, P3, P4 and P5, wherein P1 and P4 is shown as the closed state due to subregion again, and P2, P3 and P5It is shown to effective (i.e. its data record be currently retrieved and handling subregion).In the embodiment of description,Partition list 2650 at SMS control subsystem includes for efficient zoned entry, and does not include the subregion for closingEntry (for example, when working node subregion again generation after obtain to obtain stream information call response when, the entryIt may be deleted by working node).At least in some implementations, and all subregions when front opening of non-streaming are givenCan there must be the corresponding entry in PA table at time point;It current distribute or is handling on the contrary, for example can only presentThe subset of those subregions.
In the Exemplary contexts being shown in FIG. 26, subregion P2 and P3 are assigned to the work for being respectively provided with identifier W7 and W3Make node, and P5 is currently unappropriated.In different implementations, healthy indicator column 2620 can store different types ofValue.In some implementations, working node can be responsible for periodically (for example, intuitively pushing away every N seconds, or according to based on some groupsDisconnected arrangement) content of healthy indicator column in the PA entry for the subregion that they are distributed is updated, to indicate that working node isIt is effective and can continue to them retrieves and processes operation.In Figure 26, the working node for the entry can be storedThe instruction of the nearest time (" last modification time ") of healthy indicator column is updated, such as working node W7 is shown as 2013Entry is had been modified by 02:24:54 and 53 second of on December 1, in.In some embodiments, other working nodes can be used mostWhether modification time value is healthy to determine assignment node afterwards, for example, if passing by X seconds or X minutes, as being used for instituteIt states defined in the failover policy in stage, assignment node may be assumed unhealthy or inaccessible, and the subregion can be redistributed.In other implementations, counter can be used as to healthy indicator (exampleSuch as, if Counter Value has not been changed in Y seconds, assignment node can be considered as the candidate shifted for failure), orIt can be used when instruction assignment node reads " last reading the time " value of entry for the last time.
In at least some embodiments, 2622 value of workload level indicator can be deposited for example by assignment nodeStorage is in the entry, locating such as during some nearest time intervals (for example, in five minutes before last modification time)Quantity, the nearest performance-relevant index of working node of the record of reason, such as cpu busy percentage, memory utilization rate, storageUtilization ratio of device etc..In some embodiments, this workload level indicator value can be used by working node, is with determinationNo there are load imbalances, as follows described in Figure 29, and take action in response to detect unbalanced.ExampleSuch as, working node Wk can determine that its workload level has been more than that average workload is horizontal, and can not distribute in its subregionOne, or can be with request dynamic again subregion;Alternatively, working node Wk can determine its workload relative to other worksMake too low for the workload of node, and other subregion can be distributed for its own.Therefore, in the embodiment of descriptionIn, by using the column of PA table indicated in Figure 26, one in the control plane function of same type is can be performed in working nodeA bit, the control plane function can be executed usually by dedicated SPS control node in central controlled SPS implementation
Figure 27 is shown can be selected by the execution of the working node in stream process stage at it according at least some embodimentsThe operating aspect of the upper subregion for executing processing operation.It, can be in the SPS processing stage for decentralised control as shown in step 2701PA table PAT1 is initialized at the database service of SP1.It can be for example when the host for example from client facility or from supplierCalculated examples at network data center, which are called when SPS client library component, creates the table.Client library can be used for variousPurpose: for example to provide for having the executable component for staying in the particular procedure realized at SPS stage operation, such as JAR(JavaTMAchieve) file, with indicating label (such as program name, process title or calculated examples title), the label canFor identify working node, be used to refer to need to be used as input for the stage stream, be used to refer to the defeated of the stageDestination (if yes) etc. out.It in some embodiments, can be initially PAT1 filling for being defined for the stageInlet flow subregion { P1, P2 ... } at least one subset entry or row.In some implementations, it can initially protectIt is vacant to hold table, and one or more working nodes can for example be due to obtaining subregion metadata from SMS control subsystemTable filling is used for the row of unappropriated subregion.It can be at each calculated examples for example in provider network or in clientStart the initial set (step 2704) of working node { W1, W2 ... } at all computing devices.In the embodiment of descriptionIn, working node can be assigned to PAT1 and read and write access.
When working node occurs online, they can access PAT1 respectively with the unappropriated subregion that tries to find out.For example,Working node W1 can check PAT1 and find that subregion P1 is unappropriated (step 2707).W1 then can be for example depending onThe type of the database service used is come by using the write request of conditionity or businesslike update request and in PAT1The entry of P1 is updated, P1 is distributed to by W1 (step 2710) with instruction.By having updated the table, W1 can be examined by using SMSLarge rope system interface starts the retrieval (step 2713) of the data record of P1, and stage PSl can be executed on the record of retrievalProcessing operation.
Meanwhile at some time points, different working node W2 can access PAT1 with the trial of their own, with discoveryUnappropriated subregion (step 2716).W2 can distribute P1 based on the determination of more newly arriving before W1, but unallocated differentSubregion P2.In some embodiments, by the current of W2 (such as healthy indicator column in the entry based on the P2) P2 madeAlso bootable W2 selects P2 to the unhealthy or inactive determination of assignment node.Therefore, at least some embodiments,The determination of unallocated state or the unhealthy condition of work at present node can be used to select for redistributing (or initial pointWith) given subregion.W2 then can attempt to update PAT1 so that P2 is distributed to oneself (step 2719).If be updated successfully, W2SMS Retrieval Interface can be begun to use to retrieve P2 record (step 2722), and execute and be defined for the appropriate of the stageProcessing operation.
As previously mentioned, the working node in the SPS of decentralised control (usually non-frequently) can obtain subregion from SMS and reflectInformation is penetrated, and is come if necessary using this information update PA table.Figure 28 show according at least some embodiments can be byThe working node in stream process stage, which executes, to be come based on the information update subregion allocation table obtained from flow management service control subsystemOperating aspect.As shown in step 2801, (such as divide during working node initialization or in response to various trigger conditionsOne closing in its subregion of dispensing), working node W1 request can be submitted to SMS control subsystem with obtain it is nearest orCurrent partition list or effective partition list.In some implementations, acquisition stream information can be called for this purposeOr similar API.In some embodiments, other trigger conditions can be used: for example, after the time of random quantity or loudIt should be in unexpected the decreasing or increasing of workload level, working node can respectively be configured to obtain new partition list.It canThe partition list returned by SMS is compared (step 2807) with the entry in the PA table for being used for the subregion.In descriptionIn embodiment, if it find that (for example, if there are some points not in PA table in the partition list of newest acquisition in differenceArea, or if there is the not entry in the list of SMS in PA table), working node can be inserted into or delete item in PA tableMesh, to solve the difference (step 2810).If (in some implementations, currently had with the entry for deleting as targetAssignment node it would be possible that needing other coordination, such as can notify the work of distribution directly or through PA table itselfMake node.)
After adjusting the difference, or if not detecting difference, one group of subregion, In is may be selected in working node W1Working node W1 should execute the processing operation (step 2813) in the stage on described group of subregion, and PA therefore may be updatedTable.In some cases, depending on the trigger condition for causing to retrieve partition list, W1 may have one for distributing to itOr multiple subregions, and may not be needed to make a change its distribution or update PA table.W1 then can be must not be withIn the case that SMS control subsystem interacts or changes the quantity of the entry in PA table, continue to retrieve its distributed oneThe data record of a subregion or multiple subregions, and handle the record (step 2816).Finally, when detecting trigger condition(such as when the equivalent of " partition end of arrival " response is received retrieval request, to indicate that subregion is to close) isNewest partition information W1 can send again to SMS control subsystem and request, and repeatable step 2801 forward operation.
Figure 29 shows the load balancing that can be executed by the working node in stream process stage according at least some embodimentsThe aspect of operation.As shown in step 2901, working node W1, which can determine to work as, detects any one of various trigger conditions,It such as detects when high resource utilization level or stays on its stage based on configurable arrangement and execute load balancingAnalysis.W1 can check the entry (step 2904) in PA table, to determine the various work figureofmerits for being used for the stage.This fingerMark may include the average for distributing to the subregion of working node, working node or different subregions average work load it is horizontalThe range or distribution of (being stored in the embodiment in table by workload level indicator), each working node workloadDeng.
W1 then can (such as the quantity based on the subregion for distributing to W1 and/or workload level indicator by subregion) generalThe workload of its own is compared with some or all of of the index.In general, in the conclusion that can get three typesAny one: W1 be overload, W1 be underload or W1 workload it is both less high or less low.It can be by by oneThe client for configuring the stage in a little embodiments for its interests is selected tactful or lead in other embodimentsIt crosses using the heuristics of some default settings and limits the workload level of " excessively high " or " too low ".If W1 determines its workToo low (step 2907) is measured, such as less than some minimal load threshold T1, then can recognize busier or more high load workNode Wk (step 2910).W1 then can for example by attempting, the Pm entry in modification PA table, (this may be produced this modification of requestGenerating the notice for Wk) or start one or more subregion Pm being transferred to W1 from Wk by direct request WkProcess (step 2913).
If W1 determines the excessively high (step 2916) of its workload, such as can recognize one or more more than max-thresholds T2, W1A subregion Pn that it is distributed is to abandon (that is, release to be allocated by other working nodes) (step 2919).W1 then canSuch as entry (the step appropriate in PA table is modified by column its identifier of removal of the distribution from the entry for Pn2922).If the workload of W1 is both less high or less low or has taken a variety of action described above after W1 to increaseOr its workload is reduced, W1 can start the process over its record (step 2925) for being assigned subregion extremely.When and if symbolWhen closing the condition for triggering the analysis of another load balancing, the forward operation corresponding to step 2901 is repeated.It should be noted that schemingIn operation shown in 29, W1 is shown as only just starting when its workload relative to its own detects unbalancedThe variation of workload.In other embodiments, if W1 detects unbalanced in other working nodes in addition to itself,For example, if W1 can start again balanced action when W1 determines that W2 has the workload level more much lower than W3.In some realizationsIn mode, and if when W1 detects that workload is unbalanced, W1 can (such as by call it is all as shown in Figure 3 againSubregion stream (repartitionStream) SMS API and its equivalent) request or start dynamic subregion again.In some implementationsIn scheme, many kinds of operations shown in Figure 29 can be executed by the working node configured recently, such as grasped when in the stageWhen new node being added to the stage after a period of time in work, new node can come from heavy duty existing section by requestThe subregion of point is redistributed to notify their presence of existing node indirectly.In some embodiments, can also at one orIt is used at multiple SMS subsystems or alternatively using the dispersion for SPS working node for being similar to those described aboveControl technology, such as intake, storage or retrieval subsystem node can be used similar to PA table shared data structure comeCoordinate their workload.
It should be noted that in various embodiments, can be used in addition in the flow chart of Figure 17-Figure 24 and Figure 27-Figure 29The operation of those of shown operation, to realize flow management service and/or stream process function described above.In some embodiment partyIn case, some in the operation that shows can not be realized, can perhaps be realized in a different order or parallel rather than sequentiallyIt realizes.It shall yet further be noted that about each of the SMS and SPS function that programming interface in various embodiments is supported, it is a kind ofOr any combination of multiple technologies can be used to realize the interface, including webpage, network address, network service API, other API, orderColumn tool, graphic user interface, mobile applications (app), tablet computer app etc..
Use case
Establish the mensurable dynamic based on subregion of the acquisition for flow data record, storage, retrieval and interim processingThe technique described above of the configurable managed multi-tenant service of state may be useful in many kinds of situations.For example, large-scaleProvider network may include thousands of example host, to realize while be used for many differences of ten hundreds of clientsMulti-tenant or single tenant service Service Instance.The monitoring installed in various examples and host and/or charging proxy can be fastSpeed generates thousands of index records, it may be necessary to and it stores and analyzes the index and record to generate the accurate station message recording,The effective supply plan of the data center for provider network is determined, detecting network attack etc..The record of monitoring canThe inlet flow of the SMS for mensurable intake and storage is formed, and can realize finger of the SPS technology of description for acquisitionTarget analysis.Similarly, it acquires and analyzes from many Log Sources (for example, the application of the node from distributed application programProgram log, or the system log of host or calculated examples at data center) a large amount of log recording applicationProgram can also can utilize SMS and SPS function.In at least some environment, SPS processing operation may include that real-time ETL (is extractedConversion load) processing operation, (that is, received data record is converted to the operation for being used to be loaded into destination in real time, withoutIt is to carry out the conversion offline), or the conversion of the data record for being inserted into data warehouse.Using for by dataBe loaded into data warehouse in real time SMS/SPS combination will the data be inserted into warehouse in be used for analyze before can avoid pairIn the delay cleaned and data of the arrangement from one or more data sources are commonly required.
Many different " big data " application programs can also be used SMS and SPS technology to construct.For example, stream can be usedEfficiently perform the analysis of the trend in various forms of social media interactions.It can will acquire from mobile phone or tablet computerData, the location information of such as user is as stream record to manage.Such as the audio or video acquired from a monitor camera group of planesInformation can represent the flow data that can be acquired and be handled to potentially contribute to prevent various types of attacks in a manner of mensurableAnother type of collection.It needs for example to look in the distance from meteorological satellite, the sensor based on ocean, the sensor based on forest, astronomyThe scientific application program of the analysis of the growing data set of mirror acquisition may also benefit from flow management and processing as described hereinAbility.Config option and priced option based on flexible policy can help different types of user customization to be suitble to, and theirs is specified pre-It calculates and persistent data/usability requirements stream function.
The embodiment of the disclosure can be described in view of following clause:
1. a kind of system comprising:
One or more computing devices, are configured to:
First group of programming interface is realized, so that the client of multi-tenant flow management service can be absorbed from a variety of dataIt is specific data flow selection data intake strategy in strategy, wherein a variety of data intake strategies include absorbing at least onceStrategy, according to the intake strategy at least once, the instruction of data record is transferred to institute one or manyly by record submitterFlow management service is stated, until receiving affirmative acknowledgment;
Second group of programming interface is realized, so that the client can be described from a variety of persistent data strategiesSpecific data flow selects persistent data strategy, wherein a variety of persistent data strategies include more copy persistence plansSlightly, according to more copy persistence strategies, need to be managed service by flow tube in the corresponding storage location storage numberAccording to multiple copies of record;
Service center, which is managed, in flow tube passes through the corresponding of first group of programming interface and second group of programming interfaceTo receive the client, to be that the specific data flow selection is described absorb the first of strategy at least once and refer to programming interfaceShow and the client has been that the specific data flow selects the second of more copy persistence strategies to indicate;
Multiple transmission of the specific data record of service are managed in response to instruction flow tube,
At least one affirmative acknowledgment for corresponding to the multiple transmission is sent according to the strategy of intake at least once;WithAnd
In response to the specific transmission of the multiple transmission, according to more copy persistence strategies on multiple storage groundThe copy of the storage specific data record at point.
2. the system as described in clause 1, wherein a variety of data intake strategies include absorbing strategy as possible, according to describedAt least some data records that tactful flow tube reason service has the specified stream of to be subjected and storage are absorbed as possible, without mentioning for recordFriendship person provides corresponding positive acknowledgment.
3. the system as described in clause 1, wherein including by more copy persistence strategies that the client selectsIt is ready to use in the instruction of the type of the storage location of storing data transcript, wherein the type of the storage location includes followingOne of items: (a) storage device based on disk, (b) solid state drive (SSD), (c) volatibility RAM (deposit by arbitrary accessReservoir), (d) non-volatile ram, (e) data base management system or (f) by provider network realize network-accessible store clothesThe memory node of business.
4. the system as described in clause 1, wherein the more copy persistence strategies selected by the client include askingThe instruction for the data endurance level asked, wherein one or more of computing devices are also configured to:
The data endurance level of the request is at least partially based on to select the multiple storage location.
5. the system as described in clause 1, wherein one or more of computing devices are also configured to:
One or more of the following terms is at least partially based on to determine and need for flow management operation to specific visitorThe charging volume that family end is collected: strategy and (b) (a) are absorbed by the specific visitor by the specific selected data of clientThe selected persistent data strategy in family end.
6. a kind of method comprising:
The following terms is executed by one or more computing devices:
One group of programming interface is realized, so that the client of flow management service can be from a variety of data intake strategySpecific data flow selection data intake is tactful, wherein a variety of data intake strategies include that intake at least once is tactful, rootAccording to the intake strategy at least once, record submitter needs that the instruction of data record is transferred to the stream one or manylyManagement service, until receiving affirmative acknowledgment;
Request is received by the programming interface of described group of programming interface, the request indicates that the client has been describedThe specific data flow selection intake strategy at least once;
In response to receiving multiple transmission of the specific data record of instruction flow tube reason service center,
The corresponding affirmative for corresponding to each transmission of the multiple transmission is sent according to the strategy of intake at least onceConfirmation;And
In response to receiving the specific transmission of the multiple transmission, the institute of the specific data flow is used for according to selectionState the copy that persistent data strategy stores the specific data record at one or more storage locations.
7. the method as described in clause 6, wherein a variety of data intake strategies include absorbing strategy as possible, according to describedAt least some data records that tactful flow tube reason service has the specified stream of to be subjected and storage are absorbed as possible, without mentioning for recordFriendship person provides corresponding positive acknowledgment.
8. the method as described in clause 6, wherein flow tube reason service is configured according to the intake strategy at least onceTo eliminate duplicate data record.
9. the method as described in clause 6, wherein flow tube reason service is configured according to the intake strategy at least onceIt is more than a copy to correspond to the multiple transmission to store the specific data record.
10. the method as described in clause 6 further includes being executed by one multiple computing devices:
Second group of programming interface is realized, so that the client can be described from a variety of persistent data strategiesSpecific data flow selects the persistent data strategy.
11. the method as described in clause 10, wherein a variety of persistent data strategies include more copy persistence strategiesWith single copy persistence strategy.
12. the method as described in clause 10, wherein the persistent data strategy includes to be ready to use in storing data recordStorage location type instruction, wherein the type of the storage location includes one of the following terms: (a) be based on diskStorage device, (b) solid state drive (SSD), (c) volatibility RAM (random access memory), (d) non-volatile ram, (e)The memory node of data base management system or the network-accessible storage service (f) realized by provider network.
13. the method as described in clause 10, wherein the persistent data strategy includes to be ready to use in the specific numberAccording to the instruction of the first kind of the storage location of the first subregion of stream, and there is the second subregion for being ready to use in the specific data flowStorage location different types of instruction.
14. the method as described in clause 10, wherein not including by the persistent data strategy that the client selectsThe instruction for needing the type of the storage location recorded for storing data further includes being held by one multiple computing devicesRow:
Need the storage location recorded for storing data by one or more subassembly selections of flow tube reason serviceType.
15. the method as described in clause 14, wherein the type of the selection storage location includes selection for described specificData flow the first subregion storage location the first kind, and selection be used for the specific data flow the second subregionStorage location different type.
16. the method as described in clause 10 is also wrapped wherein the persistent data strategy includes the instruction of target delayIt includes and is executed by one multiple computing devices:
It is stand-by to have selected that the target delay is at least partially based on by one or more components of flow tube reason serviceIn the type of the storage location of storing data record.
17. the method as described in clause 10, wherein the persistent data strategy selected by the client includes askingThe instruction for the data endurance level asked further includes being executed by one multiple computing devices:
The data endurance level of the request is at least partially based on to select multiple storage locations, the data note of the streamRecord needs to be stored at the multiple storage location.
18. the method as described in clause 6 further includes being executed by one multiple computing devices:
One or more of the following terms is at least partially based on to determine and need for flow management operation to specific visitorThe charging volume that family end is collected: strategy and (b) (a) are absorbed by the specific visitor by the specific selected data of clientThe selected persistent data strategy in family end.
19. the method as described in clause 6 further includes being executed by one multiple computing devices:
Partitioning strategies are at least partially based on to determine the multiple data intake for needing to be configured to the specific data flowNode, according to the partitioning strategies: one or more attributes (a) based on the specific data record are by the specific numberNode is absorbed according to the data for recording the member for being appointed as particular zones and the multiple data (b) being selected to absorb node to absorbThe data record of the particular zones.
20. the method as described in clause 6 further includes being executed by one multiple computing devices:
Partitioning strategies are at least partially based on to determine the multiple data storage for needing to be configured to the specific data flowNode, according to the partitioning strategies: one or more attributes (a) based on the specific data record are by the specific numberThe member of particular zones is appointed as according to record and (b) selects the data memory node of the multiple data memory node to storeThe data record of the particular zones.
21. the method as described in clause 6 further includes being executed by one multiple computing devices:
Storage can be used to the corresponding sequence number of the read requests in response to the ordered set to data record, the sequence numberEach data record of multiple data records corresponding to the specific data stream, including the specific data record;
The minmal sequence number for needing to be stored, which is received, from the submitter of the different data records of the data flow corresponds to instituteState the instruction of different data records;And
Storage is greater than or equal to the specific sequence number of the minmal sequence number, and the specific sequence number corresponds to describedDifferent data records.
22. a kind of non-transitory computer for storing program instruction may have access to storage medium, described program instruction is when oneWhen being executed on a or multiple processors:
One group of programming interface is realized, so that the client of flow management service can absorb strategy based on the data of selectionLasting data is selected from a variety of persistent data strategies to be ingested the specific data flow in the stream for its data recordProperty strategy, wherein a variety of persistent data strategies include: (a) more copy persistence strategies, it is lasting according to more copiesProperty strategy, have the multiple copies for staying in the data record that the specific data flow is stored at corresponding storage location, and (b) singleCopy persistence strategy, wherein needing to be stored the single copy of the data record of the specific data flow;
Request is received by the programming interface of described group of programming interface, the request indicates that the client has been describedSpecific data flow selects more copy persistence strategies;And
Multiple memory nodes are configured to realize more copy persistence plans for the data record of the specific data flowSlightly.
23. the non-transitory computer as described in clause 22 may have access to storage medium, wherein a variety of persistent datasAt least one of strategy persistent data strategy includes the instruction for being ready to use in the type of storage location of storing data record,Wherein the type of the storage location includes one of the following terms: (a) storage device based on disk, the driving of (b) solid-stateDevice (SSD), (c) volatibility RAM (random access memory), (d) non-volatile ram, (e) data base management system or (f) byThe memory node for the network-accessible storage service that provider network is realized.
24. the non-transitory computer as described in clause 22 may have access to storage medium, wherein selected by the clientMore copy persistence strategies include the instruction of data endurance level of request, wherein described instruction ought one orWhen being executed on multiple processors:
The data endurance level of the request is at least partially based on to determine multiple memory nodes to be configured.
It further include by one multiple 25. the non-transitory computer as described in clause 22 may have access to storage mediumComputing device executes:
It is at least partially based on and is grasped to determine for flow management by the persistent data strategy of the specific client selectionMake the charging volume for needing to be collected to specific client.
It further include by one multiple 26. the non-transitory computer as described in clause 22 may have access to storage mediumComputing device executes:
Partitioning strategies are at least partially based on to determine the multiple data storage for needing to be configured to the specific data flowNode, according to the partitioning strategies: one or more attributes (a) based on the specific data record are by the specific numberThe member of particular zones is appointed as according to record and (b) selects the data memory node of the multiple data memory node to storeThe data record of the particular zones.
27. a kind of system comprising:
One or more computing devices, are configured to:
Determining in multiple nodes of multi-tenant flow management service has point to be applied to distribute the data record of data flowArea's strategy, wherein the partitioning strategies include the initial mapping that multiple subregions are recorded in data, the initial mapping is at least partlyBased on one or more attribute values associated with the data record;
The first subregion is identified using the initial mapping, needs to be at least partially based on specific attribute value for described firstThe specific data record of the data flow of subregion is appointed as member;
It is generated at the intake node of flow tube reason service and indicates that record obtains the specific data note in sequenceThe sequence number of the position of record, the sequence number corresponds to the specific data record, wherein the intake node is at least partlyIt is selected based on the initial mapping;
The corresponding sequence number for being at least partially based on the multiple data record remembers multiple data of first subregionRecord is sequentially stored at the data storage location of flow tube reason service, wherein the data storage location at least portionDivide and is selected based on the initial mapping;And
In response to having met the determination of the trigger criteria for data flow described in subregion again,
The mapping that the modification of subregion is recorded in data is generated,
In the case where the pause that no data record for arranging the data flow obtains, begin to use the modificationMapping;And
To select at least one of the following terms, relaying with another data record of the specific attribute valueBegin to use the mapping of the modification receives other data records later: (a) service of flow tube reason is differentAbsorb node or (b) the different data storage locations of flow tube reason service.
28. the system as described in clause 27, wherein the sequence number includes the instruction of the following terms: (a) with it is described specificData record the associated timestamp of intake and (b) other subsequence value.
29. the system as described in clause 28, wherein one or more of computing devices are also configured to:
Selection has the initial time stamp value of the sequence number for being ready to use in data record, and the data record has to be used described repairThe mapping that changes is mapped;
It is requested in response to data record retrieval, the data record retrieval request indicates specific sequence number,
It is lower than the initial time stamp value in response to the value of the specific timestamp as indicated by the specific sequence numberDetermination, one or more data records are retrieved using the initial mapping;And
It is not less than the determination of the initial time stamp value in response to the described value of the specific timestamp, is repaired using describedOne or more data records are retrieved in the mapping changed.
30. the system as described in clause 27, wherein the trigger criteria includes one or more of the following terms: (a)The detection of overload, (b) the unbalanced detection of workload, (c) for the client request of subregion again, (d) dataThe determination of the change of the persistent data demand of stream, (e) determination for the arrangement that software version changes, (f) data flow makesWith the detection of the change of mode, determination that (g) price of data flow described in subregion influences again or (h) with the data flow phaseThe determination of associated performance objective.
31. the system as described in clause 27, wherein one or more of computing devices are also configured to:
Client request is received, the client request instruction there are the one or more subregion marks for being ready to use in the data flowIt is quasi-;And
The client request is at least partially based on to generate the initial mapping.
32. a kind of method comprising:
The following terms is executed by one or more computing devices of flow management service:
One or more attribute values of the data record are at least partially based on to determine that it is more that the data of data flow are recordedThe initial mapping of a subregion;
The first subregion is identified using the initial mapping, needs to be at least partially based on specific attribute value for described firstThe specific data record of the data flow of subregion is appointed as member;
The specific data record is stored in and is at least partially based on the initial mapping and at the storage location that selects;And
Meet trigger criteria in response to determination,
The mapping that the modification of subregion is recorded in data is generated, and
Different storage locations is selected for another data record with the specific attribute value, relaying starts to makeOther data records are received later with the mapping of the modification.
33. the method as described in clause 32 further includes being executed by one multiple computing devices:
Before having met the determination of the trigger criteria, at least partly base in the specific data recordProcessing operation is executed at the working node that the initial mapping selects;And
After having met the determination of the trigger criteria, in the different data with the particular attribute-valueThe processing operation is executed at the different working nodes that select being at least partially based on the mapping of the modification on record.
34. the method as described in clause 32 further includes being executed by one multiple computing devices:
It is generated at the intake node of flow tube reason service and indicates that record obtains the specific data note in sequenceThe sequence number of the position of record, the sequence number corresponds to the specific data record, wherein the intake node is at least partlyIt is selected based on the initial mapping;And
To correspond to the data record of the first subregion described in certain sequential storage of the sequence number of data record.
35. the method as described in clause 34, wherein the sequence number includes the instruction of the following terms: (a) with it is described specificData record the associated timestamp of intake and (b) other subsequence value.
36. the method as described in clause 35, wherein the clock of the specific data record is absorbed in timestamp instructionTime, wherein the method also includes being executed by one multiple computing devices:
In response to retrieval request, by will sequence number associated with one or more of data records as index keyIt requests to need to be at least partially based on specified record intake time model to retrieve one or more of data records to useIt encloses to retrieve one or more data records.
37. the method as described in clause 35 further includes being executed by one multiple computing devices:
Selection has the initial time stamp value of the sequence number for being ready to use in data record, and the data record has to be used described repairThe mapping that changes is mapped;
Specific sequence number is indicated in response to receiving data record retrieval request,
In response to determining that the value of the specific timestamp as indicated by the specific sequence number is lower than the initial timeTimestamp value retrieves one or more data records using the initial mapping;And
It is not less than the initial time stamp value in response to the described value of the determination specific timestamp, utilizes the modificationMapping retrieve one or more data records.
38. the method as described in clause 32, wherein the mapping of the modification is come really using at least one other attribute valueDetermine the subregion of data record.
39. the method as described in clause 32 further includes being executed by one multiple computing devices:
Client request is received, the client request instruction there are the one or more subregion marks for being ready to use in the data flowIt is quasi-;And
The client request is at least partially based on to generate the initial mapping.
40. the method as described in clause 32 further includes being executed by one multiple computing devices:
Client request is received, the client request indicates the trigger criteria.
41. the method as described in clause 32 further includes being executed by one multiple computing devices:
Client request is received with data flow described in subregion again, wherein the client request indicates reflecting for the modificationThe one or more parameters penetrated.
42. the method as described in clause 32 further includes being executed by one multiple computing devices:
Client request is received by subregion again, the client request instruction is solved the problems, such as with possible as targetState.
43. the method as described in clause 32 further includes being executed by one multiple computing devices:
Hash function is applied to at least part of the content of the specific data record, with obtain be expressed as includeSelect the hashed result of the binary value of the position of quantity;
Determine the specific subinterval of the range of the denotable binary value in position by using selection quantity, the hashAs a result it is subordinated to the specific subinterval;And
The subinterval is at least partially based on to identify first subregion.
44. the method as described in clause 32, wherein one or more of attribute values include at least one in the following termsIt is a: (a) the subregion key supplied by data record source, (b) identification in data record source, (c) at least the one of the content of data recordPart or (d) network address in the data record source.
45. the method as described in clause 32 further includes being executed by one multiple computing devices:
After the mapping for generating the modification, compared with the configuration carried out before the mapping for generating the modification,The node of the different number of flow tube reason system is configured, to execute one or more of the following terms for the data flow:(a) data record is absorbed, (b) data record storage or (c) data record retrieval.
46. the method as described in clause 32 further includes being executed by one multiple computing devices:
Storage represents the combined data structure of the mapping of the initial mapping and the modification, wherein the combined numberIt include: (a) first entry according to structure, the first entry indicates that specific data record attribute is reflected according to the initial mappingThe first subregion being incident upon and the initial mapping are suitable for the time range of first subregion, and (b) second entry, and described theTwo entries indicate the different subregion that the specific data record attribute is mapped to according to the mapping of the modification and describedThe mapping of modification is suitable for the different time ranges of the different subregion.
47. the method as described in clause 46, wherein the combined data structure includes one in the following terms: (a)Tree or (b) directed acyclic graph.
48. the method as described in clause 32, wherein the mapping of the modification includes one as indicated by the initial mappingCombined instruction to subregion.
49. a kind of non-transitory computer for storing program instruction may have access to storage medium, described program instruction is when oneWhen being executed on a or multiple processors:
Determine have it is to be applied come in multiple nodes of flow management service distribute data flow data record partitioning strategies,Wherein the partitioning strategies include the instruction that the initial mapping of multiple subregions is recorded in data;
Configuration flow tube manages first group of intake node of service to receive the data of the stream according to the initial mappingRecord and first group of data memory node of flow tube reason service according to the initial mapping storing data to record;
In response to having met for the dynamically determination of the trigger criteria of data flow described in subregion again,
Generate the mapping that the modification of different multiple subregions is recorded in data;
It is configured to different group intake nodes that the received data after the mapping for generating the modification records and notWith group data memory node;And
At least in a specific period, retain the number stored at first group of back end according to the initial mappingAccording to record, the data record of arrival is stored according to the mapping of modification during the specific period.
50. the non-transitory computer as described in clause 49 may have access to storage medium, wherein described instruction is when described oneWhen being executed on a or multiple processors:
The initial set of the data retrieval node of the data flow is configured to according to the initial mapping;And
In response to having met the determination of the trigger criteria, it is configured to the data inspection of the different groups of the data flowSocket point.
51. non-transitory computer as described in clause 49 may have access to storage medium, wherein the trigger criteria include withOne or more of lower items: (a) detection of overload, (b) the unbalanced detection of workload, (c) for subregion againClient request, (d) determination of the change of the persistent data demand of the data flow, (e) software version change arrangementDetermination, (f) detection of the change of the use pattern of the data flow, (g) again data flow described in subregion price influenceDetermining or (h) determination of performance objective associated with the data flow.
52. the non-transitory computer as described in clause 49 may have access to storage medium, wherein described instruction is when described oneWhen being executed on a or multiple processors:
Client request is received, the client request instruction there are the one or more subregion marks for being ready to use in the data flowIt is quasi-;And
The client request is at least partially based on to generate the initial mapping.
53. the non-transitory computer as described in clause 49 may have access to storage medium, wherein described instruction is when described oneWhen being executed on a or multiple processors:
Receive client request, the triggering mark of the client request instruction for data flow described in subregion againIt is quasi-.
Illustrative computer system
In at least some embodiments, some or all of one or more services of the techniques described herein are realizedDevice may include general-purpose computing system, and the general-purpose computing system includes or be configured to access one or more computers canMedium is accessed, the techniques described herein include realizing the portion of SMS subsystem (for example, intake, storage, retrieval and control subsystem)The technology of part and SPS working node and control node.Figure 30 shows this general-purpose calculating appts 9000.In the embodiment party shownIn case, computing device 9000 includes the one or more processors that system storage 9020 is connected to by I/O interface 90309010a, 9010b or 9010n.Computing device 9000 further includes the network interface 9040 for being connected to I/O interface 9030.
In various embodiments, computing device 9000 can be the list for including processor 9010a, 9010b or a 9010nProcessor system, or including several processor 9010a, 9010b or 9010n (such as two, four, eight or another suitable numberAmount) multicomputer system.Processor 9010a, 9010b or 9010n can be any suitable processor for being able to carry out instruction.For example, in various embodiments, processor 9010a, 9010b or 9010n can appoint to realize in various instruction set architectures (ISA)A kind of what general or embeded processor of framework, the framework such as x86, PowerPC, SPARC or MIPS ISA or anyOther suitable ISA.In a multi-processor system, each processor 9010a, 9010b or 9010n can be realized usually but not necessarilyIdentical ISA.In some implementations, alternative conventional processor or the external of processor in addition to routine use figureProcessing unit (GPU).
System storage 9020 can be configured to store the instruction that can be accessed by processor 9010a, 9010b or 9010nAnd data.In various embodiments, any suitable memory technology can be used to realize for system storage 9020, the storageDevice technology such as static random access memory (SRAM), synchronous dynamic ram (SDRAM), non-volatile/flash type memory orThe memory of any other type.In the shown embodiment, the program instruction sum number of one or more required functions is realizedCode 9025 is illustrated as according to (such as those described above method, technology and data) and data 9026 are stored in system storageIn 9020.
In one embodiment, I/O interface 9030 can be configured to coprocessor 9010a, 9010b or 9010n, beThe I/O flow between any peripheral unit in system memory 9020 and device, the peripheral unit includes network interface 9040Or other peripheral interfaces, such as various types of persistence of the physical copy of storing data object subregion and/or volatileProperty storage device.In some embodiments, any required agreement, timing or other data conversions can be performed in I/O interface 9030So that the data-signal for coming from a component (for example, system storage 9020) is converted into being suitable for by another component (exampleSuch as, processor 9010a, 9010b or 9010n) format that uses.In some embodiments, I/O interface 9030 may include forBy the support for the device that various types of peripheral buses are attached, for example peripheral parts interconnected (PCI) bus of peripheral busStandard or universal serial bus (USB) standard change form.In some embodiments, the function of I/O interface 9030 can divideAt in two or more individual components, such as north bridge and south bridge.Equally, in some embodiments, I/O interface 9030Some or all of functions, such as interface to system storage 9020, can be directly incorporated into processor 9010a, 9010b orIn 9010n.
Network interface 9040 can be configured to allow data in computing device 9000 and be attached to one or more networksIt is swapped between 9050 other devices 9060 (other computer systems or device shown in such as Fig. 1 to Figure 29).In each embodiment, network interface 9040 can be supported through any suitable wired or wireless general data network (exampleAs ethernet network type) it is communicated.In addition, network interface 9040 can be supported through telecommunication/telephone network (such as mouldQuasi- speech network or digital fiber communication network), by storage area network (such as fiber channel SAN) or pass through any otherThe network and/or agreement of suitable type are communicated.
In some embodiments, system storage 9020 can be as matched above with respect to described in Fig. 1 to Figure 29Set to store the computer accessible of program instruction and data embodiment, with for realizing corresponding method andThe embodiment of equipment.However, in other embodiments, can be received in different types of computer accessible,Send or store program instruction and/or data.In general, computer accessible may include the storage medium of non-transitoryOr storage medium, such as magnetic medium or optical medium, such as it is connected to by I/O interface 9030 magnetic of computing device 9000Disk or DVD/CD.It can also include that can be used as system storage 9020 or another kind of that non-transitory computer, which may have access to storage medium,The memory of type is included in any volatibility or non-volatile media in some embodiments of computing device 9000, such asRAM (for example, SDRAM, DDR SDRAM, RDRAM, SRAM etc.), ROM etc..In addition, computer accessible may include passingDefeated medium or signal, electric signal, electromagnetic signal or the number such as transmitted by communication media (such as network and/or Radio Link)Word signal can such as be realized by network interface 9040.In various embodiments, all multiple meters as shown in Figure 30Calculating some or all of device can be used to realize the function;For example, the software component kimonos run on various different devicesBusiness device may cooperate to provide the function.In some embodiments, in addition to or instead of using general-purpose computing system is come in factExisting, storage device, network equipment or dedicated computer system can be used to realize for the part of the function.Art as used hereinLanguage " computing device " refers to the device of at least all these types, and is not limited to the device of these types.
Conclusion
Each embodiment can also include that sending and receiving are connect in computer accessible according to what the description of front was realizedIt send or store instruction and/or data.In general, computer accessible may include storage medium or storage medium(such as magnetic medium or optical medium, such as disk or DVD/CD-ROM), volatibility or non-volatile media (such as RAM (exampleSuch as, SDRAM, DDR, RDRAM, SRAM etc.), ROM etc.) and transmission medium or signal (such as pass through communication media (such as networkAnd/or Radio Link) transmission signal (such as electric signal, electromagnetic signal or digital signal)).
Such as the exemplary implementation scheme of various method representation methods shown in the figure and described herein.The method canTo be realized in software, hardware or combinations thereof.The sequence of method can change, and each element can be added, arrange againSequence, combination, omission, modification etc..
Those skilled in the art in benefit of this disclosure, which will be clear that, can carry out various modifications and change.Be intended to comprising it is all thisA little modifications and variations, and correspondingly, above description should be regarded as having illustrative and not restrictive meaning.

Claims (15)

CN201480061590.7A2013-11-112014-11-11Data flow intake and persistence technologyActiveCN105765575B (en)

Applications Claiming Priority (5)

Application NumberPriority DateFiling DateTitle
US14/077,162US9858322B2 (en)2013-11-112013-11-11Data stream ingestion and persistence techniques
US14/077,171US9720989B2 (en)2013-11-112013-11-11Dynamic partitioning techniques for data streams
US14/077,1712013-11-11
US14/077,1622013-11-11
PCT/US2014/065052WO2015070232A1 (en)2013-11-112014-11-11Data stream ingestion and persistence techniques

Publications (2)

Publication NumberPublication Date
CN105765575A CN105765575A (en)2016-07-13
CN105765575Btrue CN105765575B (en)2019-11-05

Family

ID=53042245

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201480061590.7AActiveCN105765575B (en)2013-11-112014-11-11Data flow intake and persistence technology

Country Status (5)

CountryLink
EP (1)EP3069275A4 (en)
JP (2)JP6357243B2 (en)
CN (1)CN105765575B (en)
CA (1)CA2930026C (en)
WO (1)WO2015070232A1 (en)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9880769B2 (en)2015-06-052018-01-30Microsoft Technology Licensing, Llc.Streaming joins in constrained memory environments
US9942272B2 (en)*2015-06-052018-04-10Microsoft Technology Licensing, Llc.Handling out of order events
US10148719B2 (en)2015-06-052018-12-04Microsoft Technology Licensing, Llc.Using anchors for reliable stream processing
US10868741B2 (en)2015-06-052020-12-15Microsoft Technology Licensing, LlcAnchor shortening across streaming nodes
US9602455B2 (en)*2015-08-072017-03-21Machine Zone, Inc.Scalable, real-time messaging system
US20220014555A1 (en)2015-10-282022-01-13Qomplx, Inc.Distributed automated planning and execution platform for designing and running complex processes
US11570209B2 (en)2015-10-282023-01-31Qomplx, Inc.Detecting and mitigating attacks using forged authentication objects within a domain
KR102181640B1 (en)2016-05-172020-11-23아브 이니티오 테크놀로지 엘엘시 Distributed reconfigurable processing
CN107783834B (en)*2016-08-302021-05-07伊姆西公司Method and system for processing data
US10594760B2 (en)*2017-01-252020-03-17Futurewei Technologies, Inc.Intelligent event streaming
US11314648B2 (en)*2017-02-082022-04-26Arm LimitedData processing
US10719495B2 (en)*2017-02-092020-07-21Micron Technology, Inc.Stream selection for multi-stream storage devices
CN107612687B (en)*2017-09-252021-04-27西安建筑科技大学 A dynamic multi-copy data possession verification method based on ElGamal encryption
US11120052B1 (en)2018-06-282021-09-14Amazon Technologies, Inc.Dynamic distributed data clustering using multi-level hash trees
US10884644B2 (en)2018-06-282021-01-05Amazon Technologies, Inc.Dynamic distributed data clustering
US11163737B2 (en)*2018-11-212021-11-02Google LlcStorage and structured search of historical security data
US11989186B2 (en)*2018-11-232024-05-21Amazon Technologies, Inc.Scalable architecture for a distributed time-series database
US11934409B2 (en)2018-11-232024-03-19Amazon Technologies, Inc.Continuous functions in a time-series database
GB2608754A (en)*2018-11-232023-01-11Amazon Tech IncScalable architecture for a distributed time-series database
US11409725B1 (en)2019-02-042022-08-09Amazon Technologies, Inc.Multi-tenant partitioning in a time-series database
US11853317B1 (en)2019-03-182023-12-26Amazon Technologies, Inc.Creating replicas using queries to a time series database
CN110147354B (en)*2019-04-192023-06-02平安科技(深圳)有限公司Batch data editing method, device, computer equipment and storage medium
US11599516B1 (en)2020-06-242023-03-07Amazon Technologies, Inc.Scalable metadata index for a time-series database
JP7564449B2 (en)2021-03-292024-10-09富士通株式会社 Data processing program, information processing system and data processing method
US11461347B1 (en)2021-06-162022-10-04Amazon Technologies, Inc.Adaptive querying of time-series data over tiered storage
US11941014B1 (en)2021-06-162024-03-26Amazon Technologies, Inc.Versioned metadata management for a time-series database
CN114116732B (en)*2022-01-242022-04-05北京奥星贝斯科技有限公司Transaction processing method and device, storage device and server
US11941029B2 (en)2022-02-032024-03-26Bank Of America CorporationAutomatic extension of database partitions
CN114676117B (en)*2022-05-272022-08-16成都明途科技有限公司 A post data storage method, device and post robot
US11960774B2 (en)2022-07-202024-04-16The Toronto-Dominion BankSystem, method, and device for uploading data from premises to remote computing environments
CN117666928A (en)*2022-08-302024-03-08华为云计算技术有限公司 A data access method and system
CN115480914B (en)*2022-09-022023-07-21江苏安超云软件有限公司Method and system for realizing multi-tenant service

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5813009A (en)*1995-07-281998-09-22Univirtual Corp.Computer based records management system method
US6272598B1 (en)*1999-03-222001-08-07Hewlett-Packard CompanyWeb cache performance by applying different replacement policies to the web cache
JP2010146067A (en)*2008-12-162010-07-01Fujitsu LtdData processing program, server apparatus, and data processing method
CN102428463A (en)*2009-05-282012-04-25贺利实公司Multimedia system providing database of shared text comment data indexed to video source data and related methods
JP2012523023A (en)*2009-09-182012-09-27株式会社日立製作所 Storage system that eliminates duplicate data
JP2013539877A (en)*2010-09-172013-10-28オラクル・インターナショナル・コーポレイション Performing partial subnet initialization in a middleware machine environment

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5692177A (en)*1994-10-261997-11-25Microsoft CorporationMethod and system for data set storage by iteratively searching for perfect hashing functions
JP4057074B2 (en)*1996-04-252008-03-05富士通株式会社 Stream data striping method and stream server
US6505216B1 (en)*1999-10-012003-01-07Emc CorporationMethods and apparatus for backing-up and restoring files using multiple trails
WO2005109212A2 (en)*2004-04-302005-11-17Commvault Systems, Inc.Hierarchical systems providing unified of storage information
US7814056B2 (en)*2004-05-212010-10-12Computer Associates Think, Inc.Method and apparatus for data backup using data blocks
US7606844B2 (en)*2005-12-192009-10-20Commvault Systems, Inc.System and method for performing replication copy storage operations
US7921077B2 (en)*2006-06-292011-04-05Netapp, Inc.System and method for managing data deduplication of storage systems utilizing persistent consistency point images
US7716186B2 (en)*2007-01-222010-05-11International Business Machines CorporationMethod and system for transparent backup to a hierarchical storage system
US8190960B1 (en)*2007-12-132012-05-29Force10 Networks, Inc.Guaranteed inter-process communication
US8533478B2 (en)*2008-10-242013-09-10Hewlett-Packard Development Company, L. P.System for and method of writing and reading redundant data
US20100257140A1 (en)*2009-03-312010-10-07Philip John DavisData archiving and retrieval system
US8560639B2 (en)*2009-04-242013-10-15Microsoft CorporationDynamic placement of replica data
US9137304B2 (en)*2011-05-252015-09-15Alcatel LucentMethod and apparatus for achieving data security in a distributed cloud computing environment
JP5544523B2 (en)*2011-07-192014-07-09日本電信電話株式会社 Distributed processing system, distributed processing method, load distribution apparatus, load distribution method, and load distribution program
JP2013058101A (en)*2011-09-082013-03-28Interlink:KkCloud computing system
US20130085995A1 (en)*2011-09-292013-04-04International Business Machines CorporationManaging back up operations for data
WO2013069073A1 (en)*2011-11-072013-05-16株式会社日立製作所Time sequence data management system, apparatus and method
US20130124483A1 (en)*2011-11-102013-05-16Treasure Data, Inc.System and method for operating a big-data platform
US8886781B2 (en)*2011-12-132014-11-11Microsoft CorporationLoad balancing in cluster storage systems
US9098344B2 (en)*2011-12-272015-08-04Microsoft Technology Licensing, LlcCloud-edge topologies

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5813009A (en)*1995-07-281998-09-22Univirtual Corp.Computer based records management system method
US6272598B1 (en)*1999-03-222001-08-07Hewlett-Packard CompanyWeb cache performance by applying different replacement policies to the web cache
JP2010146067A (en)*2008-12-162010-07-01Fujitsu LtdData processing program, server apparatus, and data processing method
CN102428463A (en)*2009-05-282012-04-25贺利实公司Multimedia system providing database of shared text comment data indexed to video source data and related methods
JP2012523023A (en)*2009-09-182012-09-27株式会社日立製作所 Storage system that eliminates duplicate data
JP2013539877A (en)*2010-09-172013-10-28オラクル・インターナショナル・コーポレイション Performing partial subnet initialization in a middleware machine environment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"让我们制作服务器-Fun Sakura Cloud";大井勇人;《樱花知识.URL:http://knowledge.sakura.ad.jp/sacloud/1126/ 》;20131028;全文*
CIAD DM OV Ingestion;Michael Meisinger;《OOI Confluence Site.URL:https://confluence.oceanobservatories.org/display/syseng/》;20121025;全文*
了解Salesforce 的API使用合作工具;荒木一成;《TerraSky TECH Blog. URL:https://www.terrasky.co.jp/blog/2013/130709_001294.php》;20130709;正文第1-4页*

Also Published As

Publication numberPublication date
CA2930026A1 (en)2015-05-14
JP2018133105A (en)2018-08-23
CN105765575A (en)2016-07-13
JP2017501515A (en)2017-01-12
CA2930026C (en)2020-06-16
EP3069275A4 (en)2017-04-26
EP3069275A1 (en)2016-09-21
JP6510112B2 (en)2019-05-08
JP6357243B2 (en)2018-07-11
WO2015070232A1 (en)2015-05-14

Similar Documents

PublicationPublication DateTitle
CN105765575B (en)Data flow intake and persistence technology
CN105723679B (en) System and method for configuring a node
CN105706086B (en) Management services for ingesting, storing, and consuming large-scale data streams
US10691716B2 (en)Dynamic partitioning techniques for data streams
AU2014346366B2 (en)Partition-based data stream processing framework
US20150134795A1 (en)Data stream ingestion and persistence techniques
CN107567696A (en)The automatic extension of resource instances group in computing cluster
CN105706047B (en)Data Stream Processing frame based on subregion

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp