Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
In some data processing methods, cluster incremental data synchronization is completely calculated in real time through a master library (leader) of cluster election, so that the query bottleneck of a Database (DB) is basically relieved, but the memory problem is still serious.
As shown in fig. 1, a binlog message is issued when a data change occurs in the DB. binlog is a binary log of MySQL that records all Data Definition Language (DDL) and Data Manipulation Language (DML) (except data query statements) statements, in the form of events, and contains the time consumed by the statement execution. Thus, whenever a DB changes (e.g., increases, decreases, changes) data occurs, it can be notified to the master library (leader) cluster by a binlog message.
Then, the binlog message is received and parsed by the acquisition component to obtain parsed information. Since the binlog message is a binary file, further parsing is required. Additionally, the acquisition component may be a nal, but the disclosure is not limited thereto and may be any suitable middleware for parsing binlog messages. The cananal is a middleware developed by java and based on database incremental log parsing to provide incremental data subscription and consumption.
Then, the master library calculates the analysis information to obtain changed data. Typically, the parsed information after the cananal parsing is sent to a master library or master library cluster through a message distribution system (e.g., kafka). The master library cluster is used for consuming binlog messages, calculating and assembling advertisement casting data in real time, and pushing calculation results to a slave library (slave) cluster or downstream in real time through kafka. Thus, the master library cluster will no longer provide online service capabilities, but rather a real-time computing engine. It should be understood that the push system kafka described above is merely exemplary and is not intended to limit the present disclosure. In addition, the information pushed by kafka is gradually consumed in a queue manner, and for convenience of description, the position of the consumed information at the current time in the queue may be referred to as a consumption site or a consumption progress.
Before the DB data changes, the memory of the main library cluster has data synchronous with the DB. When the data in the DB is changed, the data is notified to the main library cluster through a binlog message, the main library cluster calculates and assembles the data based on the change information to obtain the changed data, and the changed data in the main library cluster and the data after the DB are changed are kept synchronous again.
In addition, the slave library clusters receive push messages from the master library cluster and process to update data in the memory of the slave library clusters such that the data in the memory of the slave library clusters is consistent with the altered data in the master library clusters. By synchronous pushing of the master library cluster and the slave library clusters, synchronization and consistency of the slave library clusters with the data in the DB are maintained.
As shown in fig. 1, the master library cluster may periodically backup a memory file, and after the backup is completed, notify the slave library cluster to load. And the slave library clusters load the memory files after receiving the notification of the master library cluster, and perform full load. For example, after the master library is integrated to backup a memory file to the external storage TBS, information is sent to the slave library cluster indicating that the memory file has been backed up. The slave library that received the notification message now begins to cache the backup file. As further shown with reference to fig. 2, a portion of the secondary libraries in the secondary library cluster are cached for providing online services and another portion are cached for storing memory files backed up by the primary library cluster to external storage. After the file loading is completed, the buffer memory loaded with the backup file is replaced with the online service buffer memory, and the full loading is completed. Since there are two caches in the full load process from the library cluster, this is a double cache mechanism.
The present disclosure provides a data processing method that moves service machine memory to external distributed storage. Specifically, as shown in fig. 3, the method of the present disclosure includes step 101, performing incremental loading on data in a database through a first data node (e.g., a master library or a master library cluster), the first data node calculating advertisement data in real time and writing memory data into an external memory, the memory of the first data node being used for maintaining advertisement data id and index data, wherein the external memory is configured to maintain full advertisement data. It should be understood that although advertisement data is illustrated as an example in the present disclosure, the present disclosure is not limited thereto. The manner of incremental loading of the master library may be the same as in fig. 1. The first data node may be fragmented and may include a plurality of fragments and a plurality of copies. The method of the present disclosure further comprises step S102, the second data node (e.g. an agent or from a library cluster) receiving index data from the first data node and reading the data from the external memory using the index data.
As described above, the master library service machine may maintain only the advertisement data id and the index data, and synchronize the index data to the slave library machine. The master library writes the full amount of data to the external memory and the slave library reads the relevant data from the external memory. In this way, the memory bottleneck of the service machine is not a problem since the master library service machine only maintains the advertisement data id and index data, the slave library reads data from the external memory, and does not maintain the full amount of data itself. In addition, the external memory can be laterally expanded, so that the memory bottleneck problem can be eliminated.
The following detailed description is made in connection with fig. 4, and it should be understood that the following specific examples are only for better understanding of the present invention and are not intended to limit the present disclosure.
As shown in fig. 4, delta information is sent from the database DB in response to a data change of the DB, wherein the delta message is a binlog-based delta message. Thus, the delta data path of the present disclosure will no longer be based on timing queries of, for example, 10s, but rather on the binlog delta message of mysql. When a data change occurs in the DB, a binlog message is issued. binlog is a binary log of MySQL that records all DDL and DML (except data query statements) statements, in the form of events, and contains the time spent by the statement execution. Thus, whenever a DB changes (e.g., increases, decreases, changes) data, it can be notified to the master library cluster by a binlog message.
The binlog message is received and analyzed by the acquisition component to obtain analysis information. Since the binlog message is a binary file, further parsing is required. Additionally, the acquisition component may be a nal, but the disclosure is not limited thereto and may be any suitable middleware for parsing binlog messages. The cananal is a middleware developed by java and based on database incremental log parsing to provide incremental data subscription and consumption. This part of the path is the same as above.
Then, the analysis information is calculated to obtain changed data. Typically, the parsed information after the canaal parsing is sent to a first data node (e.g., a master library or a master library cluster) through a message distribution system (e.g., kafka). The master library cluster is used to consume binlog messages, calculate in real-time the advertisement data and write into external memory (e.g., tbase). The memory of the main library only maintains the advertisement id and the md5 (index data) corresponding to the data, no longer maintains the advertisement data, and no longer provides remote procedure call protocol (RPC) service. In addition, the master library may synchronize the index data to a slave library cluster (proxy cluster).
In the present disclosure, a master library cluster may be sliced, e.g., including slice 0, slice 1, slice 2, and slice 3. In a database environment, sharding may result in creating smaller partitions in the ledger. Thus, these partitions are referred to as tiles. In each tile, multiple copies are included. At a restart of the master machine or some other operation, the roles of the master (e.g., master 1, master 2, master 3, and master 4) may be played by the corresponding copies. It should be appreciated that while a master library cluster is shown in FIG. 4 as including 4 shards and each shard includes 2 copies, the number of shards and copies is merely exemplary and other numbers of shards and copies may be included.
The secondary library clusters may query the relevant data from external memory (e.g., tbase) using the index data. The slave cluster may only be responsible for providing RPC services and no longer keep the complete data file. In this way, the memory bottlenecks of the master library and the slave library clusters are resolved. The prior method for storing complete data by the memory in the master library and the slave library is limited by the fact that the memory cannot be laterally expanded, and the problem of memory bottleneck exists. By writing the full amount of data to the external memory, e.g., tbase, the external memory supports lateral expansion, thus solving the problem that the memory is limited because it cannot be laterally expanded. Thus, online service capability may be improved.
In the above links of fig. 4, there may be only incremental loading, and no full loading. Full load may be accomplished through a data module. The data module can consume binlog data as the main library of the cluster, and the memory is maintained in the advertisement throwing id and the index data md5. In addition, the master library in the data module may also include copies. The data module may read the full data id (e.g., full in-put ad id) of the last time version in another external memory (e.g., redis, not shown in fig. 4), where the full data id in the other external memory redis is in the form of a timestamp, redis also records the kafka message location corresponding to the last time version, so that the data module may perform data backtracking and update the full in-put ad id of the new time version and send to redis. The full load data is then divided into a plurality of buckets (e.g., 100-200 buckets), and the data for each bucket is incrementally loaded along with incremental data written to Tbase by the master library, and in the same manner, memory spurs caused by full-volume global replacement can be eliminated, and the data validation time can be improved.
In some embodiments, a discrepancy/compensation component may exist between the data module and the master library cluster for finding discrepancies between the data module and the master library cluster, and data compensation is performed when discrepancies are found.
The data module of the present disclosure may only load data related to the status of the advertisement being cast, for real-time online and offline of the advertisement, with other data no longer being loaded. Because the data module only maintains the in-cast ad id and status data, memory is not a bottleneck for the data module. The full-load data module divides advertisement data into barrels (for example, 100-200 barrels), and sequentially refreshes data of each barrel into an external memory Tbase. By adopting the barrel division mode, the data module uniformly distributes full-load to incremental load, and the full-load is smoothly realized in a streaming data updating mode, so that memory spurs of the full-load are eliminated, and the effective time of data is also improved.
Thus, the master library cluster is responsible for writing data to the external memory Tbase, while the slave library cluster is responsible for reading data from the external memory, and the read-write cluster separation is performed. Furthermore, by employing proxy (proxy) clusters, traffic is isolated and data storage shards are masked. Thus, access to the memory may be imperceptible to the user. Therefore, the above factors all enable the data processing architecture of the present disclosure to support lateral expansion, and the limitation of the service machine memory bottleneck is eliminated. With the increase of the data service amount, the present disclosure can be laterally expanded by increasing the number of data service nodes, so as to eliminate the memory bottleneck problem.
The present disclosure also provides a data processing apparatus, including: a first data node (e.g., a master library or master library cluster) configured to incrementally load data in the data library; the external memory, the first data node writes the memory data into the external memory, the memory of the first data node is used for maintaining data id and index data, wherein the external memory is configured to maintain the full data; and a second data node receiving the index data from the first data node and reading the data from the external memory.
In some embodiments, the data processing apparatus further comprises: and the data module is used for carrying out full-load on the database, uniformly dispersing the full-load data into a plurality of barrels, and loading the data of each barrel in the plurality of barrels and the incremental-load data of the first data node together through the external memory.
In addition, the present disclosure also provides an electronic device, including: at least one memory and at least one processor; the memory is used for storing program codes, and the processor is used for calling the program codes stored in the memory to execute the data processing method.
Furthermore, the present disclosure also provides a computer storage medium storing a program code for executing the above-described data processing method.
In some embodiments, the present disclosure solves the problem that memory cannot be laterally expanded by storing data to external memory, the master library cluster and external memory being fragmented, and performing read-write cluster separation, the master library writing data to external memory, and the slave library reading data from external memory, eliminating the memory bottleneck of data services. In addition, by adding proxy clusters (slave clusters), the shards of the data store are masked so that the shards of the data store are not perceived by the user.
Referring now to fig. 5, a schematic diagram of anelectronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 5, theelectronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from astorage 506 into a Random Access Memory (RAM) 503. In theRAM 503, various programs and data required for the operation of theelectronic apparatus 500 are also stored. Theprocessing device 501, theROM 502, and theRAM 503 are connected to each other via abus 504. An input/output (I/O)interface 505 is also connected tobus 504.
In general, the following devices may be connected to the I/O interface 505:input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; anoutput device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like;storage 506 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow theelectronic device 500 to communicate with other devices wirelessly or by wire to replace data. While fig. 5 shows anelectronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communication device 609, or fromstorage device 506, or fromROM 502. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by theprocessing device 501.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects an internet protocol address from the at least two internet protocol addresses and returns the internet protocol address; receiving an Internet protocol address returned by the node evaluation equipment; wherein the acquired internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer-readable medium carries one or more programs that, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, there is provided a data processing method including: performing incremental loading on data in a database through a first data node, wherein the first data node is configured to write memory data into an external memory, and the memory of the first data node is used for maintaining data id and index data, and the external memory is configured to maintain full data; index data is received from the first data node by a second data node and data is read from the external memory.
According to one or more embodiments of the present disclosure, further comprising: and carrying out full-load on the database through a data module, wherein the data module is configured to uniformly disperse the full-load data into a plurality of barrels, and the data of each barrel in the plurality of barrels is loaded through the external memory together with the incrementally-loaded data of the first data node.
According to one or more embodiments of the present disclosure, incrementally loading data in the database by a first data node includes: responding to the data change of the database, and sending out incremental information through the database; receiving the incremental information through an acquisition component and analyzing the incremental information to obtain analysis information; and sending the analysis information to the first data node through a message issuing system, and carrying out incremental loading based on the analysis information through the first data node.
In accordance with one or more embodiments of the present disclosure, the message publishing system also transmits the parsing information to the data module.
According to one or more embodiments of the present disclosure, full loading of the database by the data module includes: and reading out the full data of the first time version from another external memory, carrying out data backtracking through the consumption site in the message release system corresponding to the first time version, and updating to obtain the full data id of the second time version.
According to one or more embodiments of the present disclosure, further comprising: a difference between the data module and the first data node is determined by a difference compensation module and data compensation is performed when there is a difference.
In accordance with one or more embodiments of the present disclosure, the first data node includes a plurality of shards and a plurality of replicas.
There is also provided, in accordance with one or more embodiments of the present disclosure, a data processing apparatus including: the first data node is configured to perform incremental loading on data in the database; the external storage is used for writing memory data into the external storage by the first data node, and the memory of the first data node is used for maintaining data id and index data, wherein the external storage is configured to maintain full data; a second data node receives index data from the first data node and reads data from the external memory.
According to one or more embodiments of the present disclosure, further comprising: and the data module is used for carrying out full-volume loading on the database, uniformly dispersing the full-volume loaded data into a plurality of barrels, and loading the data of each barrel and the incrementally loaded data of the first data node through the external memory.
According to one or more embodiments of the present disclosure, there is provided an electronic device including: at least one memory and at least one processor; the memory is used for storing program codes, and the processor is used for calling the program codes stored in the memory to execute the data processing method.
According to one or more embodiments of the present disclosure, there is provided a computer storage medium storing program code for executing the above-described data processing method.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.