Movatterモバイル変換


[0]ホーム

URL:


US9934287B1 - Systems and methods for expedited large file processing - Google Patents

Systems and methods for expedited large file processing
Download PDF

Info

Publication number
US9934287B1
US9934287B1US15/659,143US201715659143AUS9934287B1US 9934287 B1US9934287 B1US 9934287B1US 201715659143 AUS201715659143 AUS 201715659143AUS 9934287 B1US9934287 B1US 9934287B1
Authority
US
United States
Prior art keywords
records
file
data
data format
focus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US15/659,143
Inventor
Japan Bhatt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital One Services LLC
Original Assignee
Capital One Services LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital One Services LLCfiledCriticalCapital One Services LLC
Priority to US15/659,143priorityCriticalpatent/US9934287B1/en
Assigned to CAPITAL ONE SERVICES, LLCreassignmentCAPITAL ONE SERVICES, LLCASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: BHATT, JAPAN
Priority to US15/905,163prioritypatent/US10191952B1/en
Application grantedgrantedCritical
Publication of US9934287B1publicationCriticalpatent/US9934287B1/en
Priority to US16/233,796prioritypatent/US10949433B2/en
Priority to US17/201,311prioritypatent/US11625408B2/en
Priority to US18/297,957prioritypatent/US12111838B2/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Definitions

Landscapes

Abstract

A system includes one or more memory devices storing instructions, and one or more processors configured to execute the instructions to perform steps of a method for processing a large file. The system may receive record data comprising a plurality of records having an identification value in a common field having a data format. The system may determine a plurality of focus values based on the data format and create a plurality of virtual processing units based on the plurality of focus values. Each of the plurality of virtual processing units may process a sub-group of the plurality of records that corresponds to the focus value associated with the respective virtual processing unit.

Description

FIELD OF INVENTION
The present disclosure relates to systems and methods for expedited large file processing, and more particularly for dynamically creating a number of virtual processing units to perform parallel processing of one or more large data files based on associated focus values.
BACKGROUND
Businesses often store, access, use, and provide access to very large data files, such as files containing numerous records relating to customer information, vendor information, or employee information, as part of their business operations. From time to time, large files such as these require processing to implement a global change such as, for example, adding a new field such as a new type of account number or employee ID number to each and every record of the file. Processing such large files by conventional methods, such as processing each record in serial by a single processor, can be extremely time-consuming. The amount of time a single processor would take would be further increased if a processing error occurred, such that the entire file may have to be reprocessed again. To reduce processing time, some businesses rely on parallel processing techniques involving multiple processors operating simultaneously. However, utilizing multiple processors to perform parallel processing requires additional overhead in the form of additional infrastructure that needs to be acquired, setup, and maintained.
Accordingly, there is a need for improved systems and methods to process large files quickly, simply, and efficiently. Embodiments of the present disclosure are directed to this and other considerations.
SUMMARY
Disclosed embodiments provide systems and methods for improved processing of large files.
Consistent with the disclosed embodiments, the system may include one or more memory devices storing instructions, and one or more processors configured to execute the instructions to perform steps of a method to process a large file. The system may execute the instructions to receive record data comprising a plurality of records, where each of the plurality of records may comprise an identification value in a common field having a data format. The system may determine a plurality of focus values based on the data format, where each of the plurality of focus values is unique and corresponds to a sub-group of the plurality of records. The system may create a plurality of virtual processing units that are each associated with a unique one of the plurality of focus values. The system may process, by each of the plurality of virtual processing units, the respective sub-group of the plurality of records that corresponds to the focus value associated with the respective virtual processing unit in response to searching the record data.
Consistent with the disclosed embodiments, methods for processing large files are also disclosed.
Further features of the disclosed design, and the advantages offered thereby, are explained in greater detail hereinafter with reference to specific embodiments illustrated in the accompanying drawings, wherein like elements are indicated be like reference designators.
BRIEF DESCRIPTION OF THE DRAWINGS
Reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and which are incorporated into and constitute a portion of this disclosure, illustrate various implementations and aspects of the disclosed technology and, together with the description, serve to explain the principles of the disclosed technology. In the drawings:
FIG. 1 is a work flow diagram of an exemplary large file processing system;
FIG. 2 is a component diagram of an exemplary large file processing device including exemplary virtual processing units;
FIG. 3A is an exemplary large file having a plurality of records;
FIG. 3B is an exemplary large file having a plurality of records that have been modified by the large file processing system to add a new data field according to an example embodiment;
FIG. 4 is a flowchart of an exemplary system for processing a large file; and
FIG. 5 is a flowchart of another exemplary system for processing a large file.
DETAILED DESCRIPTION
Some implementations of the disclosed technology will be described more fully with reference to the accompanying drawings. This disclosed technology may, however, be embodied in many different forms and should not be construed as limited to the implementations set forth herein. The components described hereinafter as making up various elements of the disclosed technology are intended to be illustrative and not restrictive. Many suitable components that would perform the same or similar functions as components described herein are intended to be embraced within the scope of the disclosed electronic devices and methods. Such other components not described herein may include, but are not limited to, for example, components developed after development of the disclosed technology.
It is also to be understood that the mention of one or more method steps does not preclude the presence of additional method steps or intervening method steps between those steps expressly identified. Similarly, it is also to be understood that the mention of one or more components in a device or system does not preclude the presence of additional components or intervening components between those components expressly identified.
As used herein, “common field” may refer to a data record field that is common to a plurality of data records. For example, a plurality of records may all have a “social security number field” which is designed to store a social security number associated with the record.
As used herein, “identification value” may refer to the data stored or input into a common field of a record. For example, if the common field is a social security number field, an identification value may be the particular social security number that is stored or entered into the social security number field of a particular record.
As used herein, “data type” may refer to a set of possible data entries that may be input or stored at one or more particular character positions of a data format. For example, data types may include numbers, integers, letters, alphanumeric characters, binary numbers, base ten numbers, ASCII values, hexadecimal values, or any other predefined set of values, characters, and/or symbols that may be store in a field of a record.
As used herein, “data format” may refer to a predefined sequence of a specified number of character positions, wherein each character position has an associated data type. For example, a “license plate data format” may be a sequence of seven character positions, wherein the first three character positions have a letter data type (i.e., each of the first three characters of the license plate data format must be a letter) and the last four character positions have a number data type (i.e., each of the last four characters of the license plate data format must be a number). Thus, an identification value in a field associated with a license plate data format may be required to be sequence of seven characters, where the first three characters are letters and the last four characters are numbers.
As used herein, “character position” or “character” may refer to a position within a predefined sequence, such as a predefined sequence of a data format. For example, as described above, a license plate data format may be a sequence of seven characters and an example of an identification value adhering to the license plate data format may be “ABC1234.” In this example, the “A” may be said to be at the first character position, the “B” is at the second character position, and so on through the “4” which may be said to be at the seventh character position.
As used herein, “seed portion” may refer to a selected portion of a data format to be used in generating an associated plurality of focus values. For example, a seed portion may refer to a selected one or more character positions of a data format that may be used to generate an associated plurality of focus values based on the data type(s) associated with the selected character position(s).
As used herein, “focus value” may refer to one of the plurality of possible values that would satisfy the seed portion. The “plurality of focus values” may refer to the plurality of values that would satisfy all possible permutations of the seed portion. For example, if the seed portion of a license plate data format as described above were selected to be the first and seventh value of the data format which may be associated with the letter data type and number data type respectively, then the plurality of focus values associated with this seed portion would be every possible letter-number combination (i.e., A-0, A-1, A-2, . . . Z-7, Z-8, Z-9). Thus, in this example the plurality of focus values would be 260 unique letter-number combinations.
As used herein, “virtual processing unit” may refer to a virtual machine or container that may be configured to process a portion of a large file that corresponds to a particular focus value that is associated with the virtual processing unit. According to some embodiments, virtual processing units may be dynamically created and deleted.
As used herein, “container” may refer to a Linux container, which may relate to an operating-system-level virtualization method for running multiple isolated Linux systems (i.e., containers) on a control host using a single Linux kernel.
The disclosed embodiments are directed to systems and methods for processing a large file. The system may include one or more memory devices storing instructions, and one or more processors configured to execute the instructions to perform steps of a method. Specifically, in some embodiments, the system may execute the instructions to receive record data comprising a plurality of records, where each of the plurality of records may comprise an identification value in a common field having a data format. The system may determine a plurality of focus values based on the data format, where each of the plurality of focus values is unique and corresponds to a sub-group of the plurality of records. The system may create a plurality of virtual processing units that are each associated with a unique one of the plurality of focus values. The system may process, by each of the plurality of virtual processing units, the respective sub-group of the plurality of records that corresponds to the focus value associated with the respective virtual processing unit in response to searching the record data
In one embodiment, a system for processing a large file is disclosed. The system may include one or more processors, associated with one or more memories, which processors execute the instructions to receive record data comprising a plurality of records, where each of the plurality of records may comprise an identification value in a common field having a data format. The system may determine a plurality of focus values comprising at least a first focus value and a second focus value based on the data format. Each of the plurality of focus values may correspond to a sub-group of the plurality of records such that the first focus value may correspond to a first sub-group of the plurality of records and the second focus value may correspond to a second sub-group of the plurality of records. The system may create a first virtual processing unit for processing the first sub-group of the plurality of records corresponding to the first focus value and a second virtual processing unit for processing the second sub-group of the plurality of records corresponding to the second focus value. The system may process the first and second sub-groups of the plurality of records via the first and second virtual processing units, respectively.
In another embodiment, a method for processing a large file is disclosed. The method may include receiving record data comprising a plurality of records, where each of the plurality of records may comprise an identification value in a common field having a data format. The method may include determining a plurality of focus values based on the data format, where each of the plurality of focus values is unique and corresponds to a sub-group of the plurality of records. The method may further include creating a plurality of virtual processing units that are each associated with a unique one of the plurality of focus values. The method may further include processing, by each of the plurality of virtual processing units, the respective sub-group of the plurality of records that corresponds to the focus value associated with the respective virtual processing unit in response to searching the record data.
Although some of the above embodiments are described with respect to systems, it is contemplated that embodiments with identical or substantially similar features may alternatively be implemented as methods and/or non-transitory computer-readable media, and vice versa.
Reference will now be made in detail to exemplary embodiments of the disclosed technology, examples of which are illustrated in the accompanying drawings and disclosed herein. Wherever convenient, the same references numbers will be used throughout the drawings to refer to the same or like parts.
FIG. 1 is a diagram of an exemplary largefile processing system100 that may be used to perform one or more processes that may process a large file. The components and arrangements shown inFIG. 1 are not intended to limit the disclosed embodiments as the components used to implement the disclosed processes and features may vary. As shown,system100 may include afile processing device120 that may create a plurality ofvirtual processing units130 that may process a file or portions of a file. In some embodiments,file processing device120 may read or receive112 alarge input file106, process thelarge input file106 via a plurality ofvirtual processing units130, and may write oroutput114 anoutput file108. According to some embodiments, afile processing device120 may process alarge input file106 by, for example, adding, deleting, transforming, or modifying data to one or more data records of thelarge input file106.Large input file106 and/oroutput file108 may be stored in any file-based system or database. In some embodiments,large input file106 and/oroutput file108 may be stored by an external storage device, such as, for example, an external database, computing device, server, or cloud server. Accordingly, in some embodiments,system100 may be configured so thatfile processing device120 may communicate via a network with an external storage device that may storelarge input file106 and/oroutput file108 such thatfile processing device120 may read and/or write to the external storage device.
A network may be of any suitable type, including individual connections via the internet such as cellular or WiFi networks. In some embodiments, a network may connect terminals, services, and mobile devices using direct connections such as radio-frequency identification (RFID), near-field communication (NFC), Bluetooth™, low-energy Bluetooth™ (BLE), WiFi™, Ethernet, ZigBee™, ambient backscatter communications (ABC) protocols, USB, WAN, or LAN. Because the information transmitted may be personal or confidential, security concerns may dictate one or more of these types of connections be encrypted or otherwise secured. In some embodiments, however, the information being transmitted may be less personal, and therefore the network connections may be selected for convenience over security.
A network may comprise any type of computer networking arrangement used to exchange data. For example, a network may be the Internet, a private data network, virtual private network using a public network, and/or other suitable connection(s) that enables components infile processing system100 to send and receive information between the components offile processing system100 or to and from computing devices that are external to fileprocessing system100. A network may also include a public switched telephone network (“PSTN”) and/or a wireless network.
For ease of discussion, embodiments may be described in connection with processing a large file containing a plurality of employee records. It is to be understood, however, that disclosed embodiments are not limited to processing of large files of employee records but may be applied to many different types of large files containing various types of records. Further, steps or processes disclosed herein are not limited to being performed in the order described, but may be performed in any order, and some steps may be omitted, consistent with the disclosed embodiments.
The features and other aspects and principles of the disclosed embodiments may be implemented in various environments. Such environments and related applications may be specifically constructed for performing the various processes and operations of the disclosed embodiments or they may include a general purpose computer or computing platform selectively activated or reconfigured by program code to provide the necessary functionality. Further, the processes disclosed herein may be implemented by a suitable combination of hardware, software, and/or firmware. For example, the disclosed embodiments may implement general purpose machines configured to execute software programs that perform processes consistent with the disclosed embodiments. Alternatively, the disclosed embodiments may implement a specialized apparatus or system configured to execute software programs that perform processes consistent with the disclosed embodiments. Furthermore, although some disclosed embodiments may be implemented by general purpose machines as computer processing instructions, all or a portion of the functionality of the disclosed embodiments may be implemented instead in dedicated electronics hardware.
The disclosed embodiments also relate to tangible and non-transitory computer readable media that include program instructions or program code that, when executed by one or more processors, perform one or more computer-implemented operations. The program instructions or program code may include specially designed and constructed instructions or code, and/or instructions and code well-known and available to those having ordinary skill in the computer software arts. For example, the disclosed embodiments may execute high level and/or low level software instructions, such as machine code (e.g., such as that produced by a compiler) and/or high level code that can be executed by a processor using an interpreter.
An exemplary embodiment offile processing device120 is shown in more detail inFIG. 2. Servers, databases, and other computing devices that may storelarge input file106 and/oroutput file108 may include many components that are similar to or even have the same capabilities as those described with respect to fileprocessing device120. As shown,file processing device120 may include aprocessor210, an input/output (“I/O”)device220, amemory230 containing an operating system (“OS”)240 and aprogram250. For example,file processing device120 may be a single device or server or may be configured as a distributed computer system including multiple servers, devices, or computers that interoperate to perform one or more of the processes and functionalities associated with the disclosed embodiments. In some embodiments,file processing device120 may further include a peripheral interface, a transceiver, a mobile network interface in communication with theprocessor210, a bus configured to facilitate communication between the various components of thefile processing device120, and a power source configured to power one or more components of thefile processing device120.
A peripheral interface may include hardware, firmware and/or software that enables communication with various peripheral devices, such as media drives (e.g., magnetic disk, solid state, or optical disk drives), other processing devices, or any other input source used in connection with the instant techniques. In some embodiments, a peripheral interface may include a serial port, a parallel port, a general purpose input and output (GPIO) port, a game port, a universal serial bus (USB), a micro-USB port, a high definition multimedia (HDMI) port, a video port, an audio port, a Bluetooth™ port, a near-field communication (NFC) port, another like communication interface, or any combination thereof.
In some embodiments, a transceiver may be configured to communicate with compatible devices and ID tags when they are within a predetermined range. A transceiver may be compatible with one or more of: radio-frequency identification (RFID), near-field communication (NFC), Bluetooth™, low-energy Bluetooth™ (BLE), WiFi™, ZigBee™, ambient backscatter communications (ABC) protocols or similar technologies.
A mobile network interface may provide access to a cellular network, the Internet, a local area network, or another wide-area network. In some embodiments, a mobile network interface may include hardware, firmware, and/or software that allows the processor(s)210 to communicate with other devices via wired or wireless networks, whether local or wide area, private or public, as known in the art. A power source may be configured to provide an appropriate alternating current (AC) or direct current (DC) to power components.
Processor210 may include one or more of a microprocessor, microcontroller, digital signal processor, co-processor or the like or combinations thereof capable of executing stored instructions and operating upon stored data.Memory230 may include, in some implementations, one or more suitable types of memory (e.g. such as volatile or non-volatile memory, random access memory (RAM), read only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic disks, optical disks, floppy disks, hard disks, removable cartridges, flash memory, a redundant array of independent disks (RAID), and the like), for storing files including an operating system, application programs (including, for example, a web browser application, a widget or gadget engine, and or other applications, as necessary), executable instructions and data. In one embodiment, the processing techniques described herein are implemented as a combination of executable instructions and data within thememory230.
Processor210 may be one or more known processing devices, such as a microprocessor from the Pentium™ family manufactured by Intel™ or the Turion™ family manufactured by AMD™.Processor210 may constitute a single core or multiple core processor that executes parallel processes simultaneously. For example,processor210 may be a single core processor that is configured with virtual processing technologies. In certain embodiments,processor210 may use logical processors to simultaneously execute and control multiple processes.Processor210 may implement virtual machine technologies, or other similar known technologies to provide the ability to execute, control, run, manipulate, store, etc. multiple software processes, applications, programs, etc. One of ordinary skill in the art would understand that other types of processor arrangements could be implemented that provide for the capabilities disclosed herein.
File processing device120 may include one or more storage devices configured to store information used by processor210 (or other components) to perform certain functions related to the disclosed embodiments. In some embodiments,file processing device120 may includememory230 that includes instructions to enableprocessor210 to execute one or more applications, such as server applications, network communication processes, and any other type of application or software known to be available on computer systems. Alternatively, the instructions, application programs, etc. may be stored in an external storage or available from a memory over a network. The one or more storage devices may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible computer-readable medium.
In one embodiment,file processing device120 may includememory230 that includes instructions that, when executed byprocessor210, perform one or more processes consistent with the functionalities disclosed herein. Methods, systems, and articles of manufacture consistent with disclosed embodiments are not limited to separate programs or computers configured to perform dedicated tasks. For example,file processing device120 may includememory230 that may include one ormore programs250 to perform one or more functions of the disclosed embodiments. Moreover,processor210 may execute one ormore programs250 located remotely fromsystem100. For example,system100 may access one or moreremote programs250, that, when executed, perform functions related to disclosed embodiments. In some embodiments,file processing device120 may include a virtualprocessing unit program250 that may dynamically create a plurality ofvirtual processing units130.
According to some embodiments,file processing device120 may dynamically create a plurality ofvirtual processing units130 that may be used to process alarge input file106. For example, in some embodiments, each virtual processing unit of the plurality ofvirtual processing units130 may process a different portion of thelarge input file106. Accordingly, the plurality ofvirtual processing units130 may perform processing in parallel to one another to more quickly process thelarge input file106. In some embodiments,virtual processing units130 may be dynamically created based on focus values determined by thesystem100, as described in further detail below. Each ofvirtual processing unit130 may be or include a virtual machine or a container, such as a Linux container (which may be referred to as an “LXC container”), that may operate in isolation from one another such that a failure of onevirtual processing unit131 will not impact the processing performed by anothervirtual processing unit132. A Linux container is an operating-system-level virtualization method for running multiple isolated Linux systems (“containers”) on a control host using a single Linux kernel. A Linux container, such as, for example, Docker, provides an environment as a service by using most of the drivers of the host operating system. The host system may utilize an autoscaling process to spin up multiple instances of a given Linux container. According to some embodiments, each ofvirtual processing units130 may be deleted after completing processing of its portion of thelarge input file106. Thus, use of thevirtual processing units130 by thefile processing device120 allow for faster processing of alarge input file106, while also enablingfile processing device120 to better optimize its use of resources by dynamically deletingvirtual processing units130 that become idle. Further thevirtual processing units130 may be horizontally scaled to allow greater flexibility in the amount of processing capability available in thesystem100.
Memory230 may include one or more memory devices that store data and instructions used to perform one or more features of the disclosed embodiments.Memory230 may also include any combination of one or more databases controlled by memory controller devices (e.g., server(s), etc.) or software, such as document management systems, Microsoft™ SQL databases, SharePoint™ databases, Oracle™ databases, Sybase™ databases, or other relational or non-relational databases.Memory230 may include software components that, when executed byprocessor210, perform one or more processes consistent with the disclosed embodiments. In some embodiments,memory230 may include adatabase260 for storing related data to enablefile processing device120 to perform one or more of the processes and functionalities associated with the disclosed embodiments.
File processing device120 may also be communicatively connected to one or more memory devices (e.g., databases) locally or through a network. The remote memory devices may be configured to store information and may be accessed and/or managed byfile processing device120. By way of example, the remote memory devices may be document management systems, Microsoft™ SQL database, SharePoint™ databases, Oracle™ databases, Sybase™ databases, or other relational or non-relational databases. Systems and methods consistent with disclosed embodiments, however, are not limited to separate databases or even to the use of a database.
File processing device120 may also include one or more I/O devices220 that may comprise one or more interfaces for receiving signals or input from devices and providing signals or output to one or more devices that allow data to be received and/or transmitted byfile processing device120. For example,file processing device120 may include interface components, which may provide interfaces to one or more input devices, such as one or more keyboards, mouse devices, touch screens, track pads, trackballs, scroll wheels, digital cameras, microphones, sensors, and the like, that enablefile processing device120 to receive data from one or more users.
In exemplary embodiments of the disclosed technology,file processing device120 may include any number of hardware and/or software applications that are executed to facilitate any of the operations. The one or more I/O interfaces may be utilized to receive or collect data and/or user instructions from a wide variety of input devices. Received data may be processed by one or more computer processors as desired in various implementations of the disclosed technology and/or stored in one or more memory devices.
Whilefile processing device120 andvirtual processing units130 have been described as one form for implementing the techniques described herein, those having ordinary skill in the art will appreciate that other, functionally equivalent techniques may be employed. For example, as known in the art, some or all of the functionality implemented via executable instructions may also be implemented using firmware and/or hardware devices such as application specific integrated circuits (ASICs), programmable logic arrays, state machines, etc. Furthermore, other implementations offile processing device120 andvirtual processing units130 may include a greater or lesser number of components than those illustrated.
FIGS. 3A and 3B illustrate an exemplarylarge input file106 and anoutput file108, respectively. According to some embodiments, alarge input file106 and/oroutput file108 may be any input file (such as a fixed length file or comma separated files), a database file, a table, a spreadsheet or any other type of file used to store data in an organized fashion. Although the techniques disclosed herein may be used to process any size of file, the techniques may be particularly useful in processing files that are 16 GB (e.g., more than approximately six million records) or larger. In some embodiments, a file may be considered to be a large file if it has a file size of greater than 1 GB or has more than one million records. According to some embodiments, alarge input file106 may include a plurality ofrecords102a,102b,102c. Although only threerecords102a,102b,102c, are shown inFIG. 3A, it should be understood that alarge input file106 may contain any number of records and the three depicted are merely illustrative. In some embodiments, each record may comprise a plurality of fields, such as, for example, fields denoting a person's name, phone number, address, city, state, zip code, and social security number (SSN). A field may be a location that data may be stored or displayed. It should be understood that the fields presented herein are merely exemplary, and any number of different fields and field types may be included in a record102aof alarge input file106. A common field may be a field that is common to multiple records of the plurality of records of alarge input file106. According to some embodiments, a common field may store an identification value, such as a social security number. As stated above, each field may store or display data or information that is part of the associated record. For example, as shown inFIG. 3A, theSSN title field304 may display the type of data that may be stored in aSSN field306 that is associated with theSSN title field304. In some embodiments, a field may be associated with a data format such that that data entered into the field must comport with the associated data format. For example, a data format associated with theSSN field306 may be a nine digit number (which may or may not include hyphens as shown inFIG. 3A). Accordingly, in some embodiments, theSSN field306 may only store a nine digit number or a null (indicating that nothing has been entered in the SSN field306). A data format may be a sequence of a specified number of characters (each character associated with a character position in the sequence) having specified data types. A field (e.g., SSN field306) may be associated with a data format that may define the type of data that may be entered into or stored by the field. For example, a data format may have a specified number of character positions and each character position may be associated with a particular data type that specifies that type of data that may be entered at that position. According to some embodiments, a data type may represent a set of possible data entries for a particular character position associated with the data type. For example, a character or character position having a number data type may mean that the data entered at that position in a field must be a number (e.g., one of a number from 0 to 9). Data types may include, for example but without limitation, letters, numbers, alphanumeric characters, integers, binary numbers, base ten numbers, ASCII values, hexadecimal values, or any other predefined set of values, characters, and/or symbols that may be stored in a field of a record. For example, if a particular character position of a data format has a letter data format, it may mean that that character position can only be one of 26 possible letters (i.e., A to Z). Likewise, if a particular character position of a data format has a letter data format, it may mean that that character position can only be one of 10 possible numbers (i.e., 0 to 9). According to some embodiments, data types may be customizable so that a particular data type may include a set of any combination of values, characters, or symbols indicated by a user.
According to some embodiments, a social security number data format may be a sequence of nine characters, where each of the characters has a number data type (i.e., each of the nine characters must be a number). In some embodiments, a license plate data format may be a sequence of seven alphanumeric characters. In some embodiments, a license plate data format may be a sequence of seven characters, wherein the first three characters have a letter data type (i.e., each character must be a letter) and the last four characters have a number data type (i.e., each character must be a number). According to some embodiments, a name data format may be a sequence of characters having a predetermined maximum length, where each character of the sequence has a letter data type that is one of a letter or a null (i.e., each character of the sequence must either be a letter or a blank space). The aforementioned data formats are illustrative only, and it should be understood that any number of different data formats may be used by largefile processing system100. Further, data formats may be further defined using rules that may specify, for example, that particular characters in a sequence having a letter data type must be capital letters, lower case letters, or that they can be either. Further, data formats may be modified to include or exclude additional characters or symbols.
As described in further detail below, in some embodiments,file processing device120 may determine a plurality of focus values that may be used to generate a number of virtual processing units for processing of alarge input file106. According to some embodiments, a plurality of focus values may be determined based on a data format of a common field of the plurality ofrecords102a,102b,102c, in alarge data file106. For example, in some embodiments,SSN field306 may be a common field that stores an identification value (i.e., a social security number), having a data format that limits the data stored by the field to that of a nine digit number. Thefile processing device120 may determine a plurality of focus values based on the nine digit data format of theSSN field306. As described in further detail below, this determination may be made using a seed portion that identifies a particular portion of the data format to be used in generating an associated plurality of focus values. For example, aseed portion308 may be the last character of a nine digit data format of theSSN field306. Because the last character of the nine digit data format of theSSN field306 is a number that can be any number from zero to nine,file processing device120 may determine a plurality of ten focus values corresponding to the numbers zero through nine. According to some embodiments, the plurality of focus values may be used to divide up all of the records having the common field into a number of sub-groups. For example, in some embodiments, focus values determined from aSSN field306 may correspond to the last digit of the nine digit data format of the SSN, thereby enabling the records to be divided into ten sub-groups (i.e., sub-groups of records having SSN's ending in each number of 0 through 9). Accordingly, as shown inFIG. 3A,seed portion308 may identify a portion of a data format (e.g., the last digit of a SSN), and the plurality of focus values may represent possible values for the identified portion of the data format. So, for theseed portion308 shown inFIG. 3A, the corresponding plurality of focus values may be numbers 0 through 9. In some embodiments,file processing device120 may determine focus values from a seed portion of the data format of theSSN field306 that identifies the last two digits of the nine digit data format of the nine digit SSN, thereby enabling the records to be divided into one hundred sub-groups (i.e., sub-groups of records having SSN's ending in each number of 00 through 99). In some embodiments, the plurality of focus values may be associated with one or more particular positions (which may be referred to as character positions or characters) of the data format (e.g., the position may be the first character of the data format or the last character of the data format), for example, based on the portion of the data format identified by aseed portion308. Although the example described above relates to focus values being determined based on a data format corresponding to a SSN data format, it should be understood that focus values may be determined based on any data format. For example, a name field may have a data format that specifies that data in that field may only comprise letters from A to Z (i.e., the data format specifies a sequence of characters that all have a data type corresponding to letters of the alphabet), in which case,file processing device120 may determine that for each character in the name field data format, there may be a plurality of 26 focus values that correspond to each letter of the alphabet. Further,file processing device120 may associate (e.g., based on an associated seed portion) these 26 focus values with the first position of the data format corresponding to the name field (i.e., the first letter of the name in the name field). In this instance, the plurality of records may be divided into 26 sub-groups based on the first letter of the name in the name field. In another example, the last two characters of the name field may be identified as the seed portion for determining the plurality of focus values, in which case,file processing device120 may determine that there are 676 (i.e., 26 multiplied by 26) focus values that comprise every possible combination of two letters. In some embodiments, a focus value may be a combination of one or more letters, numbers, or other data types. For example, if a seed portion corresponds to two characters of a license plate where each character could be either a letter or a number (i.e., each character could be one of 36 possible letters or numbers), thanfile processing device120 may determine that there are a plurality of 1,296 (i.e., 36 multiplied by 36) focus values and 1,296 corresponding sub-groups. As shown by these examples, it should be understood thatfile processing device120 may determine any number of focus values based on the underlying data format the focus values are being determined in association with, and that when there are more focus values, the plurality ofrecords102a,120b,102c, may be broken into a greater number of sub-groups.
FIG. 3B illustrates anexemplary output file108. According to some embodiments, anoutput file108 may comprise substantially the same data records as thelarge input file106, but may differ in that some data may have been added, modified, or deleted by virtue of the processing performed byfile processing device120 onlarge input file106. For example, as shown inFIG. 3B, adata record102aof anoutput file108 may include a newly added employeeID title field310 and anemployee ID field312. According to some embodiments, these new fields may be added to a particular record by avirtual processing unit131 associated with the focus value that is associated with the record. For example, avirtual processing unit131 may be associated with a focus value of “9 in the last character of the SSN field” and may process all records having a “9” as the last character of the identification value in theSSN field306. In this way, a plurality ofvirtual processing units130 may quickly process portions of alarge input file106 in parallel in order to generate anoutput file108 that has been modified in some way. Further, after a particularvirtual processing unit130 has completed its processing task, it may be deleted byfile processing device120, thereby freeing up memory previously used by the deletedvirtual processing unit130 and allowing the system to operate more efficiently.
FIG. 4 shows a flowchart of anexemplary method400 for processing a large file.Method400 may be performed byfile processing device120.
Inblock410, the system may receive (e.g., via file processing device120) record data comprising a plurality of records. According to some embodiments record data may be, for example, a large file (e.g., large input file106), such as a database file, a table, a spreadsheet, or any input file such as a fixed length file or a comma separated file. In some embodiments,file processing device120 may receive record data from a local data storage. In some embodiments,file processing device120 may receive record data from a remote storage device via a network. According to some embodiments, each of the plurality of records may comprise a number of fields for storing or displaying data and may further comprise an identification value in a common field (i.e., a field that is common to a plurality of records). For example, each of the plurality of records may include a field for storing an identification value, such as a social security number (the “SSN field”). In some embodiments, the common field may be associated with a data format. For example, the SSN field may be associated with a data format that specifies that data stored in the SSN field must comprise a nine digit number (that may or may not include dashes).
Inblock420, the system may determine a plurality of focus values based on the data format associated with the common field. In some embodiments, each of the plurality of focus values may be unique from one another. The system may determine the plurality of focus values by selecting a specified portion of the data format (e.g., the last two numbers of a nine digit data format) and generating the plurality of focus values to represent every possible combination of values that may satisfy that portion of the data format. The portion of the data format selected to be used to create the focus values in this manner may be referred to as the “seed portion.” For example, if the data format is a nine digit number, the focus values may be based on seed portion that is one digit of the nine digit number (e.g., the last digit of the nine digit number), so that the system may generate ten focus values (i.e., 0 through 9). In another example, the system may determine that the focus values may be based on a seed portion that is two digits of the nine digit number (e.g., the last two digits of the nine digit number), so that they system may generate 100 focus values (i.e., 00 through 99). According to some embodiments, the seed portion of the data format may include a data type (i.e., number, letter, alphanumeric, binary, etc.) and the position of data (e.g., the first number of a SSN, or the last letter of a name) within the data type. As previously mentioned, the plurality of focus values may represent every possible value for the selected seed portion of the data format. For example, there may be 10 focus values that correspond to a seed portion that represents a one digit number, there may 100 focus values that correspond to a seed portion that represents a two digit number, there may be 26 focus values that correspond to a seed portion that represents a letter, there may be 260 focus values that correspond to a seed portion that represents a combination of a number and a letter, and so on.
In some embodiments, the system may select a seed portion that is a single digit (e.g., the last digit) of a data type associated with a common field for storing an identification value, which may be referred to as a seed portion of a first order of magnitude (e.g., a single numerical digit, a single letter, a single alphanumeric character or the like). In some embodiments, the system may select a seed portion that is two digits of a data type associated with a common field for storing an identification value, which may be referred to as a seed portion of a second order of magnitude (e.g., a pair of numerical digits, a pair of letters, a pair of alphanumeric characters, or the like). It will be understood that as the order of magnitude of the seed portion rises, the number of the plurality of focus values may rise exponentially. According to some embodiments, the system may select the magnitude of the seed portion (i.e., whether the seed portion is one digit, two digits, or more) based on the size of the record data, the number of records, or the throughput of processing the records. For example, the system may default to using a seed portion of a first order of magnitude. But, if the system determines that the size of the record data exceeds a predetermined threshold size, the system may then use a seed portion of a second order or magnitude or higher in order to determine the plurality of focus values. In some embodiments, the magnitude of the seed portion may be determined in response to a user input received byfile processing device120. For example, a user may input a selection of a portion of a data format (e.g., the user may select the last digit of a nine digit social security number data format) to be used as the seed portion that is used to create a plurality of focus values.
According to some embodiments, each of the plurality of focus values may correspond to a sub-group of the plurality of records. For example, in the case where they system uses a seed value corresponding to the last digit of a social security number data format, the system may determine that there are 10 focus values (i.e., 0-9), and each one of those focus values may correspond to a sub-group of the plurality of records. For example, focus value 0 may correspond to a sub-group of records having a social security number ending in “0,”focus value 1 may correspond to a sub-group of records having a social security number ending in “1,” and so on. In this way, the plurality of records may be divided into sub-groups, with each sub-group corresponding to a particular focus value.
Inblock430, the system may create a plurality ofvirtual processing units130. According to some embodiments, eachvirtual processing unit130 may be associated with a unique one of the plurality of focus values. Accordingly, in some embodiments, each of thevirtual processing units130 may correspond to a particular sub-group of the plurality of records that is associated with the same focus value as the virtual processing unit. As described above, avirtual processing unit131 may comprise a virtual machine or a container, such as a Linux container.
Inblock440, the system may process, by each of the plurality ofvirtual processing units130, the respective sub-group of the plurality of records that corresponds to the focus value associated with the respectivevirtual processing unit130. In some embodiments, this processing may be performed in response to searching the record data. For example, the system may search the record data to identify each record that is part of a particular sub-group by determining whether each record contains data in a common field that corresponds to the focus value associated with the sub-group. For example, avirtual processing unit131 that is associated with, for example, a focus value of “last digit of SSN is 9” may process all records containing a social security number in the SSN field that ends in 9. In some embodiments, sub-groups may be identified by in response to a single search of the record data performed by the system. In some embodiments, eachvirtual processing unit130 may independently search or read the record data to identify records that contain its respective focus value.
According to some embodiments, eachvirtual processing unit130 may process its respective sub-group of records in parallel with the othervirtual processing units130. Thus, if there are 10 focus values, there may be 10virtual processing units130 that may simultaneously process 10 different records. However in some instances it may be possible thatlarge input file106 includes no records that correspond to a particular focus value (e.g., the input file has no records that have a social security number ending in the number “9”), in which case, thevirtual processing unit130 corresponding to that focus value may be deleted after searchinglarge input file106 and failing to identify any records corresponding to its associated focus value. Accordingly, the absence of a sub-group of records associated with a particularvirtual processing unit130 will not corrupt or otherwise negatively impact the processing oflarge input file106. Avirtual processing unit130 may process a record by reading the record from, for example, alarge input file106, and outputting an output record for inclusion in, for example, anoutput file108. Avirtual processing unit130 may modify, delete, or add data to a record in the course of processing the record. Because eachvirtual processing unit130 performs its processing of thelarge input file106 in isolation from the others, if a particularvirtual processing unit130 fails or encounters an error during processing, it may simply restart its processing of thelarge input file106 without interrupting the processing being performed by the othervirtual processing units130. In some embodiments, the system may remove duplicate records in the course of processing the record data, by for example, comparing records to determine if there is more than one record that has the same social security number and deleting the duplicate records before generating anoutput file108. According to some embodiments, avirtual processing unit130 may be deleted by the system after it has finished processing its sub-group of records.
FIG. 5 shows a flowchart of anotherexemplary method500 for processing a large file.Method500 may be performed byfile processing device120.
Inblock510, the system may receive record data comprising a plurality of records in a manner substantially similar to that described with respect to block410 above.
Inblock520, the system may determine a plurality of focus values based on the data format associated with a common field in a manner substantially similar to that described with respect to block420 above. In some embodiments, the plurality of focus values may comprise at least a first focus value and a second focus value. According to some embodiments, each of the plurality of focus values may correspond to a sub-group of the plurality of records such that the first focus value may correspond to a first sub-group of the plurality of records and the second focus value may correspond to a second sub-group of the plurality of records.
Inblock530, the system may create a firstvirtual processing unit131 in a manner substantially similar to that described with respect to block430 above. According to some embodiments, the firstvirtual processing unit131 may be created for processing the first sub-group of the plurality of records corresponding to the first focus value. For example, a firstvirtual processing unit131 may be associated with a focus value of “last digit of SSN is 1,” and may be created to process a sub-group of records that include social security numbers ending in thenumber 1.
Inblock540, the system may create a secondvirtual processing unit132 in a manner substantially similar to that described with respect to block430 above. According to some embodiments, the secondvirtual processing unit132 may be created for processing the second sub-group of the plurality of records corresponding to the second focus value. For example, a secondvirtual processing unit132 may be associated with a focus value of “last digit of SSN is 2,” and may be created to process a sub-group of records that include social security numbers ending in thenumber 2.
Inblock550, the system may process the first and second sub-groups of the plurality of records via the first and secondvirtual processing units131,132, respectively, in a manner substantially similar to that described above with respect to block440. Althoughmethod500 describes the creation and use of twovirtual processing units130, it should be understood that any number ofvirtual processing units130 may be created to process respective sub-groups of records, where the number ofvirtual processing units130 corresponds to the number of focus values determined by the system. In some embodiments, the system may determine the number of focus values based on the size of the large input file. For example, the system may default to using a seed portion of a first order of magnitude (e.g., a single numerical digit, a single letter, a single alphanumeric character or the like) to generate focus values, but if the system determines that the large file (e.g., large input file106) is larger than a predetermined threshold size, then the system may use a seed portion of a second order of magnitude (e.g., a pair of numerical digits, a pair of letters, a pair of alphanumeric characters, or the like) to determine the number of focus values. In some embodiments, the system may determine the number of focus values in response to a user input. For example, a user may input a selection of a seed portion that the system may use to generate focus values.
As used in this application, the terms “component,” “module,” “system,” “server,” “processor,” “memory,” and the like are intended to include one or more computer-related units, such as but not limited to hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets, such as data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal.
Certain embodiments and implementations of the disclosed technology are described above with reference to block and flow diagrams of systems and methods and/or computer program products according to example embodiments or implementations of the disclosed technology. It will be understood that one or more blocks of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and flow diagrams, respectively, can be implemented by computer-executable program instructions. Likewise, some blocks of the block diagrams and flow diagrams may not necessarily need to be performed in the order presented, may be repeated, or may not necessarily need to be performed at all, according to some embodiments or implementations of the disclosed technology.
These computer-executable program instructions may be loaded onto a general-purpose computer, a special-purpose computer, a processor, or other programmable data processing apparatus to produce a particular machine, such that the instructions that execute on the computer, processor, or other programmable data processing apparatus create means for implementing one or more functions specified in the flow diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means that implement one or more functions specified in the flow diagram block or blocks.
As an example, embodiments or implementations of the disclosed technology may provide for a computer program product, including a computer-usable medium having a computer-readable program code or program instructions embodied therein, said computer-readable program code adapted to be executed to implement one or more functions specified in the flow diagram block or blocks. Likewise, the computer program instructions may be loaded onto a computer or other programmable data processing apparatus to cause a series of operational elements or steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide elements or steps for implementing the functions specified in the flow diagram block or blocks.
Accordingly, blocks of the block diagrams and flow diagrams support combinations of means for performing the specified functions, combinations of elements or steps for performing the specified functions, and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and flow diagrams, can be implemented by special-purpose, hardware-based computer systems that perform the specified functions, elements or steps, or combinations of special-purpose hardware and computer instructions.
Certain implementations of the disclosed technology are described above with reference to user devices may include mobile computing devices. Those skilled in the art recognize that there are several categories of mobile devices, generally known as portable computing devices that can run on batteries but are not usually classified as laptops. For example, mobile devices can include, but are not limited to portable computers, tablet PCs, internet tablets, PDAs, ultra mobile PCs (UMPCs), wearable devices, and smart phones. Additionally, implementations of the disclosed technology can be utilized with internet of things (IoT) devices, smart televisions and media devices, appliances, automobiles, toys, and voice command devices, along with peripherals that interface with these devices.
In this description, numerous specific details have been set forth. It is to be understood, however, that implementations of the disclosed technology may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. References to “one embodiment,” “an embodiment,” “some embodiments,” “example embodiment,” “various embodiments,” “one implementation,” “an implementation,” “example implementation,” “various implementations,” “some implementations,” etc., indicate that the implementation(s) of the disclosed technology so described may include a particular feature, structure, or characteristic, but not every implementation necessarily includes the particular feature, structure, or characteristic. Further, repeated use of the phrase “in one implementation” does not necessarily refer to the same implementation, although it may.
Throughout the specification and the claims, the following terms take at least the meanings explicitly associated herein, unless the context clearly dictates otherwise. The term “connected” means that one function, feature, structure, or characteristic is directly joined to or in communication with another function, feature, structure, or characteristic. The term “coupled” means that one function, feature, structure, or characteristic is directly or indirectly joined to or in communication with another function, feature, structure, or characteristic. The term “or” is intended to mean an inclusive “or.” Further, the terms “a,” “an,” and “the” are intended to mean one or more unless specified otherwise or clear from the context to be directed to a singular form. By “comprising” or “containing” or “including” is meant that at least the named element, or method step is present in article or method, but does not exclude the presence of other elements or method steps, even if the other such elements or method steps have the same function as what is named.
While certain embodiments of this disclosure have been described in connection with what is presently considered to be the most practical and various embodiments, it is to be understood that this disclosure is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
This written description uses examples to disclose certain embodiments of the technology and also to enable any person skilled in the art to practice certain embodiments of this technology, including making and using any apparatuses or systems and performing any incorporated methods. The patentable scope of certain embodiments of the technology is defined in the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal language of the claims.
EXEMPLARY USE CASES
The following exemplary use case describes an example of a typical user flow pattern. They are intended solely for explanatory purposes and not in limitation. A financial services provider may have a large file that includes a large number of records that the financial services provider wants to process in some way. For example, each record of the large file may include information about an employee such as the employee's name, address, phone number, social security number, job title, pay rate, office location, and hire date. The financial services provider may decide that it wants to create (or update) an employee ID number for each employee, and so each employee record will have to be modified to add a new field to the record that contains the employee ID number. Processing the large file using conventional processing methods is likely to take a very long time because the database is so large. However, the present system may process the large file much faster than conventional methods. First, the large file is read or received by the system (e.g., via the large file processing device120). The system (e.g., via large file processing device120) may then create a number of virtual processing units (e.g., virtual processing units130) to process the large file in parallel by assigning each virtual processing unit to process a sub-group of the records in the large file. The records can be virtually divided into sub-groups (e.g., via large file processing device120) based on a specified portion of the social security number entry of each record. For example, a first virtual processing unit may process all records having a social security number ending in “1,” a second virtual processing unit may process all records having a social security number ending in “2,” and so on. In addition to the processing speed gained by the use of multiple virtual processing units, because the virtual processing units operate in isolation from one another, if one of them fails or encounters an error, it simply restarts the processing of its sub-group of records without impacting the processing performed by the other virtual processing units. As each virtual processing unit processes its respective sub-group, it appends each record of the sub-group with a new field that includes an employee ID number. Once a particular virtual processing unit has completed processing, the system (e.g., via large file processing device120) deletes the virtual processing unit, thereby freeing up resources of the system to perform other tasks. After the system is finished processing the large file, it outputs an updated file where all records of the large file have been modified to include employee ID numbers.
Certain implementations of the disclosed technology are described above with reference to block and flow diagrams of systems and methods and/or computer program products according to example implementations of the disclosed technology. It will be understood that one or more blocks of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and flow diagrams, respectively, can be implemented by computer-executable program instructions. Likewise, some blocks of the block diagrams and flow diagrams may not necessarily need to be performed in the order presented, may be repeated, or may not necessarily need to be performed at all, according to some implementations of the disclosed technology.
These computer-executable program instructions may be loaded onto a general-purpose computer, a special-purpose computer, a processor, or other programmable data processing apparatus to produce a particular machine, such that the instructions that execute on the computer, processor, or other programmable data processing apparatus create means for implementing one or more functions specified in the flow diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means that implement one or more functions specified in the flow diagram block or blocks. As an example, implementations of the disclosed technology may provide for a computer program product, including a computer-usable medium having a computer-readable program code or program instructions embodied therein, said computer-readable program code adapted to be executed to implement one or more functions specified in the flow diagram block or blocks. Likewise, the computer program instructions may be loaded onto a computer or other programmable data processing apparatus to cause a series of operational elements or steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide elements or steps for implementing the functions specified in the flow diagram block or blocks.
As used herein, unless otherwise specified the use of the ordinal adjectives “first,” “second,” “third,” etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

Claims (3)

The invention claimed is:
1. A system for processing a large file, comprising:
one or more processors; and
a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to:
receive record data comprising a plurality of records, each of the plurality of records comprising an identification value in a common field, the common field having a data format comprising a sequence of characters and each of the identification values corresponding to a license plate number;
determine, based on the data format, a plurality of focus values comprising at least a first focus value and a second focus value, each of the plurality of focus values corresponding to a sub-group of the plurality of records such that the first focus value corresponds to a first sub-group of the plurality of records and the second focus value corresponds to a second sub-group of the plurality of records, wherein the plurality of focus values comprise a set of 260 unique two-character sequences where each unique two-character sequence comprises a letter followed by a number with the plurality of focus values corresponding to a specified portion of the sequence of characters in the data format;
create a first virtual processing unit for processing the first sub-group of the plurality of records corresponding to the first focus value;
create a second virtual processing unit for processing the second sub-group of the plurality of records corresponding to the second focus value; and
process the first and second sub-groups of the plurality of records via the first and second virtual processing units, respectively.
2. The system ofclaim 1, wherein the data format comprises a sequence of seven characters, wherein a first, second, and third character of the sequence are associated with a letter data type and a fourth, fifth, sixth, and seventh character of the sequence are associated with a number data type.
3. The system ofclaim 2, wherein the specified portion of the data format is the third and fourth characters of the sequence of seven characters.
US15/659,1432017-07-252017-07-25Systems and methods for expedited large file processingActiveUS9934287B1 (en)

Priority Applications (5)

Application NumberPriority DateFiling DateTitle
US15/659,143US9934287B1 (en)2017-07-252017-07-25Systems and methods for expedited large file processing
US15/905,163US10191952B1 (en)2017-07-252018-02-26Systems and methods for expedited large file processing
US16/233,796US10949433B2 (en)2017-07-252018-12-27Systems and methods for expedited large file processing
US17/201,311US11625408B2 (en)2017-07-252021-03-15Systems and methods for expedited large file processing
US18/297,957US12111838B2 (en)2017-07-252023-04-10Systems and methods for expedited large file processing

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
US15/659,143US9934287B1 (en)2017-07-252017-07-25Systems and methods for expedited large file processing

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US15/905,163ContinuationUS10191952B1 (en)2017-07-252018-02-26Systems and methods for expedited large file processing

Publications (1)

Publication NumberPublication Date
US9934287B1true US9934287B1 (en)2018-04-03

Family

ID=61724876

Family Applications (5)

Application NumberTitlePriority DateFiling Date
US15/659,143ActiveUS9934287B1 (en)2017-07-252017-07-25Systems and methods for expedited large file processing
US15/905,163ActiveUS10191952B1 (en)2017-07-252018-02-26Systems and methods for expedited large file processing
US16/233,796Active2037-12-27US10949433B2 (en)2017-07-252018-12-27Systems and methods for expedited large file processing
US17/201,311Active2037-11-02US11625408B2 (en)2017-07-252021-03-15Systems and methods for expedited large file processing
US18/297,957ActiveUS12111838B2 (en)2017-07-252023-04-10Systems and methods for expedited large file processing

Family Applications After (4)

Application NumberTitlePriority DateFiling Date
US15/905,163ActiveUS10191952B1 (en)2017-07-252018-02-26Systems and methods for expedited large file processing
US16/233,796Active2037-12-27US10949433B2 (en)2017-07-252018-12-27Systems and methods for expedited large file processing
US17/201,311Active2037-11-02US11625408B2 (en)2017-07-252021-03-15Systems and methods for expedited large file processing
US18/297,957ActiveUS12111838B2 (en)2017-07-252023-04-10Systems and methods for expedited large file processing

Country Status (1)

CountryLink
US (5)US9934287B1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10191952B1 (en)*2017-07-252019-01-29Capital One Services, LlcSystems and methods for expedited large file processing
US10831708B2 (en)*2017-12-202020-11-10Mastercard International IncorporatedSystems and methods for improved processing of a data file
US11789962B1 (en)*2020-02-072023-10-17Hitps LlcSystems and methods for interaction between multiple computing devices to process data records

Citations (87)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5625815A (en)*1995-01-231997-04-29Tandem Computers, IncorporatedRelational database system and method with high data availability during table data restructuring
US5837983A (en)*1996-06-101998-11-17Psc, Inc.Readability monitoring system for optical codes
US5897638A (en)*1997-06-161999-04-27Ab Initio Software CorporationParallel virtual file system
US6115705A (en)*1997-05-192000-09-05Microsoft CorporationRelational database system and method for query processing using early aggregation
US6173415B1 (en)*1998-05-222001-01-09International Business Machines CorporationSystem for scalable distributed data structure having scalable availability
US6230164B1 (en)*1997-05-092001-05-08Alcatel Usa Sourcing, L.P.Communication system with rapid database synchronization
US6367070B1 (en)*1998-01-132002-04-02Intel CorporationMeans and method for establishing loop-level parallelism
US6381601B1 (en)*1998-12-222002-04-30Hitachi, Ltd.Grouping and duplicate removal method in a database
US20030037049A1 (en)*2001-05-112003-02-20Guenter WeigeltDynamic buffer allocation
US20030115206A1 (en)*2001-12-192003-06-19Gilbert Gary MartinMethod for fault tolerant modification of data representation in a large database
US20040220929A1 (en)*2003-04-302004-11-04Rozeboom Paul L.Partitioning a database keyed with variable length keys
US20050049996A1 (en)*2003-08-252005-03-03International Business Machines CorporationMethod, system, and article of manufacture for parallel processing and serial loading of hierarchical data
US20050102325A1 (en)*2003-09-152005-05-12Joel GouldFunctional dependency data profiling
US20050234913A1 (en)*2002-05-282005-10-20Providian Financial CorporationMethod and system for creating and maintaining an index for tracking files relating to people
US20060230400A1 (en)*2005-03-302006-10-12International Business Machines CorporationAllocating entitled processor cycles for preempted virtual processors
US20070022137A1 (en)*2005-07-222007-01-25Scarboro Danny MData source business component generator
US20080172402A1 (en)*1999-09-282008-07-17University Of Tennessee Research FoundationMethod of indexed storage and retrieval of multidimensional information
US7403942B1 (en)*2003-02-042008-07-22Seisint, Inc.Method and system for processing data records
US20080313189A1 (en)*2007-06-152008-12-18Sap AgParallel processing of assigned table partitions
US20090164471A1 (en)*2007-12-192009-06-25Jinmei ShenManaging Distributed Data
US20090182970A1 (en)*2008-01-162009-07-16Battista Robert JData Transmission for Partition Migration
US7590620B1 (en)*2004-06-182009-09-15Google Inc.System and method for analyzing data records
US20090276477A1 (en)*2008-05-022009-11-05Oracle International CorporationMethod of partitioning a database
US20090307229A1 (en)*2008-04-282009-12-10Infosys Technologies LimtedMethod and system for rapidly processing and transporting large XML files
US20100011019A1 (en)*2005-07-222010-01-14Scarboro Danny MDatabase Business Components Code Generator
US7657540B1 (en)*2003-02-042010-02-02Seisint, Inc.Method and system for linking and delinking data records
US20100106724A1 (en)*2008-10-232010-04-29Ab Initio Software LlcFuzzy Data Operations
US20100223261A1 (en)*2005-09-272010-09-02Devajyoti SarkarSystem for Communication and Collaboration
US20100281027A1 (en)*2009-04-302010-11-04International Business Machines CorporationMethod and system for database partition
US20100306854A1 (en)*2009-06-012010-12-02Ab Initio Software LlcGenerating Obfuscated Data
US20110040810A1 (en)*2009-08-122011-02-17International Business Machines CorporationScalable file management for a shared file system
US20110225215A1 (en)*2010-03-122011-09-15Hitachi, Ltd.Computer system and method of executing application program
US20110252275A1 (en)*2010-04-072011-10-13Verizon Patent And Licensing Inc.Method and system for partitioning data files for efficient processing
US8040552B2 (en)*2006-12-282011-10-18Fuji Xerox Co., Ltd.Variable data image generating device, variable data image forming system and computer readable storage medium
US8234250B1 (en)*2009-09-172012-07-31Netapp. Inc.Processing data of a file using multiple threads during a deduplication gathering phase
US20120239558A1 (en)*2011-03-162012-09-20GridX, Inc.Method and systems for efficiently processing large volumes of complex small value financial transactions
US20130067025A1 (en)*2011-09-122013-03-14Microsoft CorporationTarget subscription for a notification distribution system
US20130124585A1 (en)*2011-11-152013-05-16Kabushiki Kaisha ToshibaFile processing apparatus and file processing method
US20130124525A1 (en)*2011-11-152013-05-16Arlen AndersonData clustering based on candidate queries
US20130166568A1 (en)*2011-12-232013-06-27Nou Data CorporationScalable analysis platform for semi-structured data
US20130166589A1 (en)*2011-12-232013-06-27Daniel BaeumgesSplit processing paths for a database calculation engine
US20130173560A1 (en)*2012-01-032013-07-04Intelius Inc.Dynamic record blocking
US20130205028A1 (en)*2012-02-072013-08-08Rackspace Us, Inc.Elastic, Massively Parallel Processing Data Warehouse
US8538123B1 (en)*2007-03-092013-09-17Cummins-Allison Corp.Apparatus and system for imaging currency bills and financial documents and method for using the same
US20130297788A1 (en)*2011-03-302013-11-07Hitachi, Ltd.Computer system and data management method
US8600994B1 (en)*2010-09-022013-12-03Teradata Us, Inc.Performing an outer join between a small table and a large table
US20130325915A1 (en)*2011-02-232013-12-05Hitachi, Ltd.Computer System And Data Management Method
US20140059000A1 (en)*2011-04-082014-02-27Hitachi, Ltd.Computer system and parallel distributed processing method
US20140082053A1 (en)*2011-03-302014-03-20Lin ChenSystem and method for generating information file based on parallel processing
US20140101178A1 (en)*2012-10-082014-04-10Bmc Software, Inc.Progressive analysis for big data
US8719270B2 (en)*2010-09-292014-05-06International Business Machines CorporationUtilizing metadata generated during XML creation to enable parallel XML processing
US20140207820A1 (en)*2013-01-182014-07-24Electronics And Telecommunications Research InstituteMethod for parallel mining of temporal relations in large event file
US8812564B2 (en)*2011-12-202014-08-19Sap AgParallel uniqueness checks for partitioned tables
US20140236951A1 (en)*2013-02-192014-08-21Leonid TaycherOrganizing books by series
US20140279992A1 (en)*2013-03-142014-09-18Bmc Software, Inc.Storing and retrieving context senstive data in a management system
US20140279704A1 (en)*2013-03-152014-09-18Broadridge Financial Solutions, Inc.Mapping consumer ownership of financial assets to geographic localities and computer-implemented methods and computer systems thereof
US20140324861A1 (en)*2013-04-262014-10-30Wal-Mart Stores, Inc.Block Partitioning For Efficient Record Processing In Parallel Computing Environment
US20140330785A1 (en)*2012-03-292014-11-06Hitachi Data Systems CorporationHighly available search index with storage node addition and removal
US8929640B1 (en)*2009-04-152015-01-06Cummins-Allison Corp.Apparatus and system for imaging currency bills and financial documents and method for using the same
US20150066992A1 (en)*2013-09-032015-03-05Acxion CorporationChange Value Database System and Method
US20150095351A1 (en)*2013-10-022015-04-02Google Inc.Dynamic Shuffle Reconfiguration
US20150100592A1 (en)*2013-10-032015-04-09Google Inc.Persistent Shuffle System
US20150149503A1 (en)*2013-11-262015-05-28Ab Initio Technology LlcParallel access to data in a distributed file system
US20150172369A1 (en)*2013-12-172015-06-18Yahoo! Inc.Method and system for iterative pipeline
US20150169622A1 (en)*2013-12-062015-06-18Zaius, Inc.System and method for storing and retrieving data in different data spaces
US20150220579A1 (en)*2014-02-052015-08-06International Business Machines CorporationOptimization of an in memory data grid (imdg) schema based upon a no-sql document model
US20150254264A1 (en)*2013-12-302015-09-10Huawei Technologies Co., Ltd.Method for Recording Transaction Log, and Database Engine
US20150293980A1 (en)*2014-04-112015-10-15Cellco Partnership (D/B/A Verizon Wireless)Data compass
US20150347549A1 (en)*2014-06-022015-12-03International Business Machines CorporationDatabase Query Processing Using Horizontal Data Record Alignment of Multi-Column Range Summaries
US20160012075A1 (en)*2013-12-252016-01-14Hitachi, Ltd.Computer system and data management method
US20160034527A1 (en)*2014-08-012016-02-04International Business Machines CorporationAccurate partition sizing for memory efficient reduction operations
US20160063042A1 (en)*2014-08-262016-03-03Fujitsu LimitedComputer-readable recording medium, data placement method, and data placement device
US20160070721A1 (en)*2014-09-042016-03-10International Business Machines CorporationParallel processing of a keyed index file system
US20160092552A1 (en)*2014-09-262016-03-31Oracle International CorporationMethod and system for implementing efficient classification and exploration of data
US20160092452A1 (en)*2014-09-252016-03-31Mengjiao WangLarge-scale processing and querying for real-time surveillance
US20160117393A1 (en)*2014-10-222016-04-28David von RickenbachCombinatorial Business Intelligence
US9336024B1 (en)*2012-12-272016-05-10Google Inc.Clustering for parallel processing
US20160171072A1 (en)*2014-12-162016-06-16Futurewei Technologies, Inc.System and Method for Massively Parallel Processing Database
US20160292221A1 (en)*2015-03-312016-10-06International Business Machines CorporationVertically partitioned databases
US20160366575A1 (en)*2015-06-102016-12-15Sap SeMobile digital cellular telecommunication system with advanced functionality for rating correction
US20160378752A1 (en)*2015-06-252016-12-29Bank Of America CorporationComparing Data Stores Using Hash Sums on Disparate Parallel Systems
US20170147674A1 (en)*2015-11-232017-05-25Ab Initio Technology LlcStoring and retrieving data of a data cube
US20170199896A1 (en)*2016-01-132017-07-13American Express Travel Related Services Company,Systems and methods for processing binary mainframe data files in a big data environment
US20170206208A1 (en)*2016-01-202017-07-20Oracle International CorporationSystem and method for merging a mainframe data file to a database table for use by a mainframe rehosting platform
US20170206256A1 (en)*2013-03-152017-07-20Amazon Technologies, Inc.Scalable analysis platform for semi-structured data
US20170242904A1 (en)*2015-03-112017-08-24Hitachi, Ltd.Computer system and transaction processing management method
US9817700B2 (en)*2011-04-262017-11-14International Business Machines CorporationDynamic data partitioning for optimal resource utilization in a parallel data processing system

Family Cites Families (87)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
DE69232425T2 (en)*1991-07-102002-10-10Hitachi, Ltd. Sorting procedure in a distributed database and access procedure for it
JP3269849B2 (en)*1992-05-292002-04-02株式会社日立製作所 Parallel database processing system and its retrieval method
JP2583010B2 (en)*1993-01-071997-02-19インターナショナル・ビジネス・マシーンズ・コーポレイション Method of maintaining consistency between local index table and global index table in multi-tier index structure
US5551018A (en)*1993-02-021996-08-27Borland International, Inc.Method of storing national language support text by presorting followed by insertion sorting
US6119214A (en)*1994-04-252000-09-12Apple Computer, Inc.Method for allocation of address space in a virtual memory system
US5960194A (en)*1995-09-111999-09-28International Business Machines CorporationMethod for generating a multi-tiered index for partitioned data
US7480653B2 (en)*1996-10-222009-01-20International Business Machines CorporationSystem and method for selective partition locking
JPH10320205A (en)*1997-05-201998-12-04Fujitsu Ltd Information processing device
US6549901B1 (en)*1997-05-302003-04-15Oracle CorporationUsing transportable tablespaces for hosting data of multiple users
US6339772B1 (en)*1999-07-062002-01-15Compaq Computer CorporationSystem and method for performing database operations on a continuous stream of tuples
US6349310B1 (en)*1999-07-062002-02-19Compaq Computer CorporationDatabase management system and method for accessing rows in a partitioned table
US6397227B1 (en)*1999-07-062002-05-28Compaq Computer CorporationDatabase management system and method for updating specified tuple fields upon transaction rollback
JP3573012B2 (en)*1999-09-292004-10-06三菱電機株式会社 Data management device and data management method
US6564221B1 (en)*1999-12-082003-05-13Ncr CorporationRandom sampling of rows in a parallel processing database system
US6694324B1 (en)*1999-12-162004-02-17Ncr CorporationDetermination of records with a specified number of largest or smallest values in a parallel database system
US7065763B1 (en)*2000-09-292006-06-20Emc CorporationMethod of reducing contention of a highly contended lock protecting multiple data items
US6891938B1 (en)*2000-11-072005-05-10Agilent Technologies, Inc.Correlation and enrichment of telephone system call data records
US6744451B1 (en)*2001-01-252004-06-01Handspring, Inc.Method and apparatus for aliased item selection from a list of items
US7240059B2 (en)*2002-11-142007-07-03Seisint, Inc.System and method for configuring a parallel-processing database system
US7945581B2 (en)*2002-11-142011-05-17Lexisnexis Risk Data Management, Inc.Global-results processing matrix for processing queries
US7185003B2 (en)*2002-11-142007-02-27Seisint, Inc.Query scheduling in a parallel-processing database system
US8166033B2 (en)*2003-02-272012-04-24Parity Computing, Inc.System and method for matching and assembling records
CN101133414B (en)*2005-05-242011-05-04特博数据实验室公司 Multiprocessor system and its information processing method
US20070033202A1 (en)*2005-08-022007-02-08Casto Paul DSystem and method for scrolling through a list
US7209923B1 (en)*2006-01-232007-04-24Cooper Richard GOrganizing structured and unstructured database columns using corpus analysis and context modeling to extract knowledge from linguistic phrases in the database
US7809769B2 (en)*2006-05-182010-10-05Google Inc.Database partitioning by virtual partitions
JP4225384B2 (en)*2006-06-162009-02-18インターナショナル・ビジネス・マシーンズ・コーポレーション Data distribution system and method
US7574429B1 (en)*2006-06-262009-08-11At&T Intellectual Property Ii, L.P.Method for indexed-field based difference detection and correction
US8266147B2 (en)*2006-09-182012-09-11Infobright, Inc.Methods and systems for database organization
GB2447907B (en)*2007-03-262009-02-18Imagination Tech LtdProcessing long-latency instructions in a pipelined processor
US7834289B2 (en)*2007-08-302010-11-16Bowe Bell & Howell CompanyMail processing system for address change service
TW201007557A (en)*2008-08-062010-02-16Inventec CorpMethod for reading/writing data in a multithread system
CN101770398A (en)*2008-12-262010-07-07罗侍田Operating system kernel
US20100211753A1 (en)*2009-02-192010-08-19Tatu Ylonen Oy LtdParallel garbage collection and serialization without per-object synchronization
US8584124B2 (en)*2010-04-202013-11-12Salesforce.Com, Inc.Methods and systems for batch processing in an on-demand service environment
US20110265039A1 (en)*2010-04-222011-10-27Palm, Inc.Category-based list navigation on touch sensitive screen
US10838957B2 (en)*2010-06-172020-11-17Microsoft Technology Licensing, LlcSlicing relational queries using spool operators
US9189505B2 (en)*2010-08-092015-11-17Lexisnexis Risk Data Management, Inc.System of and method for entity representation splitting without the need for human interaction
US9152528B2 (en)*2010-08-272015-10-06Red Hat, Inc.Long term load generator
US20120166402A1 (en)*2010-12-282012-06-28Teradata Us, Inc.Techniques for extending horizontal partitioning to column partitioning
JP5585472B2 (en)*2011-01-282014-09-10富士通株式会社 Information collation apparatus, information collation method, and information collation program
US8832693B2 (en)*2011-03-092014-09-09Unisys CorporationRuntime virtual process creation for load sharing
US20120246158A1 (en)*2011-03-252012-09-27Microsoft CorporationCo-range partition for query plan optimization and data-parallel programming model
US8880485B2 (en)*2011-03-302014-11-04Sap SeSystems and methods to facilitate multi-threaded data retrieval
JP5939740B2 (en)*2011-04-112016-06-22インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Method, system and program for dynamically allocating resources
US9087094B2 (en)*2011-04-252015-07-21Microsoft Technology Licensing, LlcProcessing records in dynamic ranges
US9619494B2 (en)*2011-05-252017-04-11Qatar FoundationScalable automatic data repair
US20130054599A1 (en)*2011-08-312013-02-28Microsoft CorporationDynamically Generated List Index
US8880841B2 (en)*2011-11-152014-11-04International Business Machines CorporationDistributing data among data storage partitions based on a forecasted demand in a networked computing environment
US20130159659A1 (en)*2011-12-192013-06-20Boris GelmanMulti-level data partitioning
US8762378B2 (en)*2011-12-232014-06-24Sap AgIndependent table nodes in parallelized database environments
US8880565B2 (en)*2011-12-232014-11-04Sap SeTable creation for partitioned tables
US8880510B2 (en)*2011-12-232014-11-04Sap SeUnique value calculation in partitioned tables
US9552393B2 (en)*2012-01-132017-01-24Business Objects Software Ltd.Adaptive record linking in a distributed computing system
US8572573B2 (en)*2012-03-092013-10-29Nvidia CorporationMethods and apparatus for interactive debugging on a non-preemptible graphics processing unit
US9734828B2 (en)*2012-12-122017-08-15Nuance Communications, Inc.Method and apparatus for detecting user ID changes
AU2013366088B2 (en)*2012-12-202019-06-06Bae Systems PlcSearchable data archive
EP2767911A1 (en)*2013-02-132014-08-20BAE Systems PLCData storage and retrieval
US20140317616A1 (en)*2013-04-232014-10-23Thomas P. ChuCloud computing resource management
US9471198B2 (en)*2013-05-292016-10-18Sap SeFlip-through presentation of a list
JP5420099B1 (en)*2013-08-202014-02-19株式会社野村総合研究所 Personal information detection apparatus and computer program
JP6387747B2 (en)*2013-09-272018-09-12日本電気株式会社 Information processing apparatus, failure avoidance method, and computer program
US10108692B1 (en)*2013-10-152018-10-23Amazon Technologies, Inc.Data set distribution
US20150149745A1 (en)*2013-11-252015-05-28Markus EbleParallelization with controlled data sharing
US9607071B2 (en)*2014-03-072017-03-28Adobe Systems IncorporatedManaging a distributed database across a plurality of clusters
GB2524085B (en)*2014-03-142021-01-20Advanced Risc Mach LtdException handling in microprocessor systems
US8924429B1 (en)*2014-03-182014-12-30Palantir Technologies Inc.Determining and extracting changed data from a data source
US9870416B2 (en)*2014-04-242018-01-16International Business Machines CorporationMethod for rebalancing data partitions
US20150379033A1 (en)*2014-06-272015-12-31International Business Machines CorporationParallel matching of hierarchical records
US9817856B2 (en)*2014-08-192017-11-14Sap SeDynamic range partitioning
US20160063006A1 (en)*2014-08-282016-03-03Google Inc.Auto-complete suggestions for structured searches
CN113220320B (en)2014-10-102024-09-27维萨国际服务协会Method and system for partial personalization during mobile application updates
US9304830B1 (en)*2014-12-222016-04-05Amazon Technologies, Inc.Fragment-based multi-threaded data processing
US9665735B2 (en)*2015-02-052017-05-30Bank Of America CorporationPrivacy fractal mirroring of transaction data
GB201509910D0 (en)*2015-06-082015-07-22Univ SurreyParallel processing of sphere decoders and other vector finding approaches using tree search
US9858124B1 (en)*2015-10-052018-01-02Amazon Technologies, Inc.Dynamic management of data stream processing
US10467201B1 (en)*2015-12-232019-11-05Massachusetts Mutual Life Insurance CompanySystems and methods for integration and analysis of data records
JP6601222B2 (en)*2016-01-042019-11-06富士通株式会社 Matrix operation program, matrix partitioning method, and parallel processing apparatus
US11050726B2 (en)2016-04-042021-06-29Nxp B.V.Update-driven migration of data
US11176480B2 (en)*2016-04-212021-11-16Oracle International CorporationSystem and method for partitioning models in a database
KR102181640B1 (en)*2016-05-172020-11-23아브 이니티오 테크놀로지 엘엘시 Distributed reconfigurable processing
DK179489B1 (en)*2016-06-122019-01-04Apple Inc. Devices, methods and graphical user interfaces for providing haptic feedback
US10592546B2 (en)*2016-09-232020-03-17Amazon Technologies, Inc.System for optimizing access to an indexed database
US10346458B2 (en)*2016-09-232019-07-09Amazon Technologies, Inc.Media asset access control system
US10366082B2 (en)*2016-12-092019-07-30Oracle International CorporationParallel processing of queries with inverse distribution function
US20180300330A1 (en)*2017-04-182018-10-18Google Inc.Proactive spilling of probe records in hybrid hash join
US9934287B1 (en)*2017-07-252018-04-03Capital One Services, LlcSystems and methods for expedited large file processing

Patent Citations (91)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5625815A (en)*1995-01-231997-04-29Tandem Computers, IncorporatedRelational database system and method with high data availability during table data restructuring
US5837983A (en)*1996-06-101998-11-17Psc, Inc.Readability monitoring system for optical codes
US6230164B1 (en)*1997-05-092001-05-08Alcatel Usa Sourcing, L.P.Communication system with rapid database synchronization
US6115705A (en)*1997-05-192000-09-05Microsoft CorporationRelational database system and method for query processing using early aggregation
US5897638A (en)*1997-06-161999-04-27Ab Initio Software CorporationParallel virtual file system
US6367070B1 (en)*1998-01-132002-04-02Intel CorporationMeans and method for establishing loop-level parallelism
US6173415B1 (en)*1998-05-222001-01-09International Business Machines CorporationSystem for scalable distributed data structure having scalable availability
US6381601B1 (en)*1998-12-222002-04-30Hitachi, Ltd.Grouping and duplicate removal method in a database
US20080172402A1 (en)*1999-09-282008-07-17University Of Tennessee Research FoundationMethod of indexed storage and retrieval of multidimensional information
US20030037049A1 (en)*2001-05-112003-02-20Guenter WeigeltDynamic buffer allocation
US20030115206A1 (en)*2001-12-192003-06-19Gilbert Gary MartinMethod for fault tolerant modification of data representation in a large database
US20050234913A1 (en)*2002-05-282005-10-20Providian Financial CorporationMethod and system for creating and maintaining an index for tracking files relating to people
US7657540B1 (en)*2003-02-042010-02-02Seisint, Inc.Method and system for linking and delinking data records
US7403942B1 (en)*2003-02-042008-07-22Seisint, Inc.Method and system for processing data records
US20040220929A1 (en)*2003-04-302004-11-04Rozeboom Paul L.Partitioning a database keyed with variable length keys
US20050049996A1 (en)*2003-08-252005-03-03International Business Machines CorporationMethod, system, and article of manufacture for parallel processing and serial loading of hierarchical data
US20050102325A1 (en)*2003-09-152005-05-12Joel GouldFunctional dependency data profiling
US7590620B1 (en)*2004-06-182009-09-15Google Inc.System and method for analyzing data records
US20060230400A1 (en)*2005-03-302006-10-12International Business Machines CorporationAllocating entitled processor cycles for preempted virtual processors
US20070022137A1 (en)*2005-07-222007-01-25Scarboro Danny MData source business component generator
US20100011019A1 (en)*2005-07-222010-01-14Scarboro Danny MDatabase Business Components Code Generator
US20100223261A1 (en)*2005-09-272010-09-02Devajyoti SarkarSystem for Communication and Collaboration
US8040552B2 (en)*2006-12-282011-10-18Fuji Xerox Co., Ltd.Variable data image generating device, variable data image forming system and computer readable storage medium
US8538123B1 (en)*2007-03-092013-09-17Cummins-Allison Corp.Apparatus and system for imaging currency bills and financial documents and method for using the same
US20080313189A1 (en)*2007-06-152008-12-18Sap AgParallel processing of assigned table partitions
US20090164471A1 (en)*2007-12-192009-06-25Jinmei ShenManaging Distributed Data
US20090182970A1 (en)*2008-01-162009-07-16Battista Robert JData Transmission for Partition Migration
US20090307229A1 (en)*2008-04-282009-12-10Infosys Technologies LimtedMethod and system for rapidly processing and transporting large XML files
US20090276477A1 (en)*2008-05-022009-11-05Oracle International CorporationMethod of partitioning a database
US20100106724A1 (en)*2008-10-232010-04-29Ab Initio Software LlcFuzzy Data Operations
US20170161326A1 (en)*2008-10-232017-06-08Ab Initio Technology LlcFuzzy Data Operations
US8929640B1 (en)*2009-04-152015-01-06Cummins-Allison Corp.Apparatus and system for imaging currency bills and financial documents and method for using the same
US20100281027A1 (en)*2009-04-302010-11-04International Business Machines CorporationMethod and system for database partition
US20100306854A1 (en)*2009-06-012010-12-02Ab Initio Software LlcGenerating Obfuscated Data
US20110040810A1 (en)*2009-08-122011-02-17International Business Machines CorporationScalable file management for a shared file system
US20130086135A1 (en)*2009-08-122013-04-04International Business Machines CorporationScalable file management for a shared file system
US8234250B1 (en)*2009-09-172012-07-31Netapp. Inc.Processing data of a file using multiple threads during a deduplication gathering phase
US20110225215A1 (en)*2010-03-122011-09-15Hitachi, Ltd.Computer system and method of executing application program
US20110252275A1 (en)*2010-04-072011-10-13Verizon Patent And Licensing Inc.Method and system for partitioning data files for efficient processing
US8600994B1 (en)*2010-09-022013-12-03Teradata Us, Inc.Performing an outer join between a small table and a large table
US8719270B2 (en)*2010-09-292014-05-06International Business Machines CorporationUtilizing metadata generated during XML creation to enable parallel XML processing
US20130325915A1 (en)*2011-02-232013-12-05Hitachi, Ltd.Computer System And Data Management Method
US20120239558A1 (en)*2011-03-162012-09-20GridX, Inc.Method and systems for efficiently processing large volumes of complex small value financial transactions
US20130297788A1 (en)*2011-03-302013-11-07Hitachi, Ltd.Computer system and data management method
US20140082053A1 (en)*2011-03-302014-03-20Lin ChenSystem and method for generating information file based on parallel processing
US20140059000A1 (en)*2011-04-082014-02-27Hitachi, Ltd.Computer system and parallel distributed processing method
US9817700B2 (en)*2011-04-262017-11-14International Business Machines CorporationDynamic data partitioning for optimal resource utilization in a parallel data processing system
US20130067025A1 (en)*2011-09-122013-03-14Microsoft CorporationTarget subscription for a notification distribution system
US20130124474A1 (en)*2011-11-152013-05-16Arlen AndersonData clustering, segmentation, and parallelization
US20130124525A1 (en)*2011-11-152013-05-16Arlen AndersonData clustering based on candidate queries
US20130124585A1 (en)*2011-11-152013-05-16Kabushiki Kaisha ToshibaFile processing apparatus and file processing method
US8812564B2 (en)*2011-12-202014-08-19Sap AgParallel uniqueness checks for partitioned tables
US20130166589A1 (en)*2011-12-232013-06-27Daniel BaeumgesSplit processing paths for a database calculation engine
US20130166568A1 (en)*2011-12-232013-06-27Nou Data CorporationScalable analysis platform for semi-structured data
US20130173560A1 (en)*2012-01-032013-07-04Intelius Inc.Dynamic record blocking
US8645399B2 (en)*2012-01-032014-02-04Intelius Inc.Dynamic record blocking
US20130205028A1 (en)*2012-02-072013-08-08Rackspace Us, Inc.Elastic, Massively Parallel Processing Data Warehouse
US20140330785A1 (en)*2012-03-292014-11-06Hitachi Data Systems CorporationHighly available search index with storage node addition and removal
US20140101178A1 (en)*2012-10-082014-04-10Bmc Software, Inc.Progressive analysis for big data
US9336024B1 (en)*2012-12-272016-05-10Google Inc.Clustering for parallel processing
US20140207820A1 (en)*2013-01-182014-07-24Electronics And Telecommunications Research InstituteMethod for parallel mining of temporal relations in large event file
US20140236951A1 (en)*2013-02-192014-08-21Leonid TaycherOrganizing books by series
US20140279992A1 (en)*2013-03-142014-09-18Bmc Software, Inc.Storing and retrieving context senstive data in a management system
US20140279704A1 (en)*2013-03-152014-09-18Broadridge Financial Solutions, Inc.Mapping consumer ownership of financial assets to geographic localities and computer-implemented methods and computer systems thereof
US20170206256A1 (en)*2013-03-152017-07-20Amazon Technologies, Inc.Scalable analysis platform for semi-structured data
US20140324861A1 (en)*2013-04-262014-10-30Wal-Mart Stores, Inc.Block Partitioning For Efficient Record Processing In Parallel Computing Environment
US20150066992A1 (en)*2013-09-032015-03-05Acxion CorporationChange Value Database System and Method
US20150095351A1 (en)*2013-10-022015-04-02Google Inc.Dynamic Shuffle Reconfiguration
US20150100592A1 (en)*2013-10-032015-04-09Google Inc.Persistent Shuffle System
US20150149503A1 (en)*2013-11-262015-05-28Ab Initio Technology LlcParallel access to data in a distributed file system
US20150169622A1 (en)*2013-12-062015-06-18Zaius, Inc.System and method for storing and retrieving data in different data spaces
US20150172369A1 (en)*2013-12-172015-06-18Yahoo! Inc.Method and system for iterative pipeline
US20160012075A1 (en)*2013-12-252016-01-14Hitachi, Ltd.Computer system and data management method
US20150254264A1 (en)*2013-12-302015-09-10Huawei Technologies Co., Ltd.Method for Recording Transaction Log, and Database Engine
US20150220579A1 (en)*2014-02-052015-08-06International Business Machines CorporationOptimization of an in memory data grid (imdg) schema based upon a no-sql document model
US20150293980A1 (en)*2014-04-112015-10-15Cellco Partnership (D/B/A Verizon Wireless)Data compass
US20150347549A1 (en)*2014-06-022015-12-03International Business Machines CorporationDatabase Query Processing Using Horizontal Data Record Alignment of Multi-Column Range Summaries
US20160034527A1 (en)*2014-08-012016-02-04International Business Machines CorporationAccurate partition sizing for memory efficient reduction operations
US20160063042A1 (en)*2014-08-262016-03-03Fujitsu LimitedComputer-readable recording medium, data placement method, and data placement device
US20160070721A1 (en)*2014-09-042016-03-10International Business Machines CorporationParallel processing of a keyed index file system
US20160092452A1 (en)*2014-09-252016-03-31Mengjiao WangLarge-scale processing and querying for real-time surveillance
US20160092552A1 (en)*2014-09-262016-03-31Oracle International CorporationMethod and system for implementing efficient classification and exploration of data
US20160117393A1 (en)*2014-10-222016-04-28David von RickenbachCombinatorial Business Intelligence
US20160171072A1 (en)*2014-12-162016-06-16Futurewei Technologies, Inc.System and Method for Massively Parallel Processing Database
US20170242904A1 (en)*2015-03-112017-08-24Hitachi, Ltd.Computer system and transaction processing management method
US20160292221A1 (en)*2015-03-312016-10-06International Business Machines CorporationVertically partitioned databases
US20160366575A1 (en)*2015-06-102016-12-15Sap SeMobile digital cellular telecommunication system with advanced functionality for rating correction
US20160378752A1 (en)*2015-06-252016-12-29Bank Of America CorporationComparing Data Stores Using Hash Sums on Disparate Parallel Systems
US20170147674A1 (en)*2015-11-232017-05-25Ab Initio Technology LlcStoring and retrieving data of a data cube
US20170199896A1 (en)*2016-01-132017-07-13American Express Travel Related Services Company,Systems and methods for processing binary mainframe data files in a big data environment
US20170206208A1 (en)*2016-01-202017-07-20Oracle International CorporationSystem and method for merging a mainframe data file to a database table for use by a mainframe rehosting platform

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10191952B1 (en)*2017-07-252019-01-29Capital One Services, LlcSystems and methods for expedited large file processing
US10949433B2 (en)2017-07-252021-03-16Capital One Services, LlcSystems and methods for expedited large file processing
US11625408B2 (en)2017-07-252023-04-11Capital One Services, LlcSystems and methods for expedited large file processing
US12111838B2 (en)2017-07-252024-10-08Capital One Services, LlcSystems and methods for expedited large file processing
US10831708B2 (en)*2017-12-202020-11-10Mastercard International IncorporatedSystems and methods for improved processing of a data file
US11789962B1 (en)*2020-02-072023-10-17Hitps LlcSystems and methods for interaction between multiple computing devices to process data records

Also Published As

Publication numberPublication date
US10949433B2 (en)2021-03-16
US20190129896A1 (en)2019-05-02
US10191952B1 (en)2019-01-29
US20190034495A1 (en)2019-01-31
US20230244680A1 (en)2023-08-03
US11625408B2 (en)2023-04-11
US12111838B2 (en)2024-10-08
US20210200757A1 (en)2021-07-01

Similar Documents

PublicationPublication DateTitle
US12111838B2 (en)Systems and methods for expedited large file processing
CN111241387A (en)Improving relevance of search results
US10599985B2 (en)Systems and methods for expediting rule-based data processing
US9720946B2 (en)Efficient storage of related sparse data in a search index
US12277105B2 (en)Methods and systems for improved search for data loss prevention
US11960880B2 (en)Systems and methods for remediation of software configuration
US12135708B2 (en)Systems and methods for maintaining data quality in a data store receiving both low and high quality data
CN110188100A (en)Data processing method, device and computer storage medium
EP4332814A1 (en)Systems and methods for secure storage of sensitive data
KR101614890B1 (en)Method of creating multi tenancy history, server performing the same and storage media storing the same
CN115878655A (en)Data operation method and device, computer equipment and storage medium
US20170330236A1 (en)Enhancing contact card based on knowledge graph
CN111414162B (en)Data processing method, device and equipment thereof
JP7302344B2 (en) Maintenance support program, maintenance support method and maintenance support device
CN114968575A (en)Asynchronous task based repeated consumption prevention method and related device
CN119398874A (en) Method, device, equipment and storage medium for viewing orders
CN118535535A (en) File management method, device, computer equipment, storage medium and program product
CN117453561A (en)Test script calling method, device, computer equipment and storage medium
CN116955357A (en) Recognition method, device and computer equipment for object identification
CN114510492A (en) Business data processing method, apparatus, computer equipment and storage medium
WO2018184552A1 (en)Data display method, system, and terminal device for reserve system, and storage medium
WO2018165959A1 (en)E-commerce data cleaning system and method
TW201812616A (en)System, apparatus and method for integration

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:CAPITAL ONE SERVICES, LLC, VIRGINIA

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BHATT, JAPAN;REEL/FRAME:043859/0206

Effective date:20170808

STCFInformation on status: patent grant

Free format text:PATENTED CASE

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment:4


[8]ページ先頭

©2009-2025 Movatter.jp