Movatterモバイル変換


[0]ホーム

URL:


USRE48438E1 - Graphic processor based accelerator system and method - Google Patents

Graphic processor based accelerator system and method
Download PDF

Info

Publication number
USRE48438E1
USRE48438E1US15/808,201US201715808201AUSRE48438EUS RE48438 E1USRE48438 E1US RE48438E1US 201715808201 AUS201715808201 AUS 201715808201AUS RE48438 EUSRE48438 EUS RE48438E
Authority
US
United States
Prior art keywords
computations
memory
sequence
processing unit
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US15/808,201
Inventor
Anatoli Gorchetchnikov
Heather Marie Ames
Massimiliano Versace
Fabrizio Santini
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neural Ai LLC
Original Assignee
Neurala Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filedlitigationCriticalhttps://patents.darts-ip.com/?family=39416485&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=USRE48438(E1)"Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Neurala IncfiledCriticalNeurala Inc
Priority to US15/808,201priorityCriticalpatent/USRE48438E1/en
Assigned to NEURALA, INC.reassignmentNEURALA, INC.ENTITY CONVERSIONAssignors: NEURALA LLC
Assigned to NEURALA LLCreassignmentNEURALA LLCASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: AMES, HEATHER MARIE, GORCHETCHNIKOV, ANATOLI, SANTINI, FABRIZIO, VERSACE, MASSIMILIANO
Priority to US17/136,343prioritypatent/USRE49461E1/en
Application grantedgrantedCritical
Publication of USRE48438E1publicationCriticalpatent/USRE48438E1/en
Assigned to NEURAL AI, LLCreassignmentNEURAL AI, LLCASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: NEURALA, INC.
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Definitions

Landscapes

Abstract

An accelerator system is implemented on an expansion card comprising a printed circuit board having (a) one or more graphics processing units (GPUs), (b) two or more associated memory banks (logically or physically partitioned), (c) a specialized controller, and (d) a local bus providing signal coupling compatible with the PCI industry standards. The controller handles most of the primitive operations to set up and control GPU computation. Thus, the computer's central processing unit (CPU) can be dedicated to other tasks. In this case a few controls (simulation start and stop signals from the CPU and the simulation completion signal back to CPU), GPU programs and input/output data are exchanged between CPU and the expansion card. Moreover, since on every time step of the simulation the results from the previous time step are used but not changed, the results are preferably transferred back to CPU in parallel with the computation.

Description

RELATED APPLICATIONS
The present application is a broadening reissue application of U.S. Pat. No. 9,189,828, filed Jan. 3, 2014, which claims a priority benefit, under 35 U.S.C. §120, as a continuation of U.S. application Ser. No. 11/860,254, now U.S. Pat. No. 8,648,867 B2, filed Sep. 24, 2007, entitled “Graphic Processor Based Accelerator System and Method,” which in turn claims the priority benefit, under 35 U.S.C. §119(e), of U.S. Application No. 60/826,892, filed Sep. 25, 2006. Each of the above-identified applications is incorporated herein by reference in its entirety. More than one reissue application has been filed for the reissue of U.S. Pat. No. 9,189,828, including this application and a reissue continuation application filed Dec. 29, 2020.
BACKGROUND
Graphics Processing Units (GPUs) are found in video adapters (graphic cards) of most personal computers (PCs), video game consoles, workstations, etc. and are considered highly parallel processors dedicated to fast computation of graphical content. With the advances of the computer and console gaming industries, the need for efficient manipulation and display of 3D graphics has accelerated the development of GPUs.
In addition, manufacturers of GPUs have included general purpose programmability into the GPU architecture leading to the increased popularity of using GPUs for highly parallelizable and computationally expensive algorithms outside of the computer graphics domain. When implemented on conventional video card architectures, these general purpose GPU (GPGPU) applications are not able to achieve optimal performance, however. There is overhead for graphics-related features and algorithms that are not necessary for these non-video applications.
SUMMARY
Numerical simulations, e.g., finite element analysis, of large systems of similar elements (e.g. neural networks, genetic algorithms, particle systems, mechanical systems) are one example of an application that can benefit from GPGPU computation. During numerical simulations, disk and user input/output can be performed independently of computation because these two processes require interactions with peripheral hardware (disk, screen, keyboard, mouse, etc) and put relatively low load on the central processing unit/system (CPU). Complete independence is not desirable, however; user input might affect how the computation is performed and even interrupt it if necessary. Furthermore, the user output and the disk output are dependent on the results of the computation. A reasonable solution would be to separate input/output into threads, so that it is interacting with hardware occurs in parallel with the computation. In this case whatever CPU processing is required for input/output should be designed so that it provides the synchronization with computation.
In the case of GPGPU, the computation itself is performed outside of the CPU, so the complete system comprises three “peripheral” components: user interactive hardware, disk hardware, and computational hardware. The central processing unit (CPU) establishes communication and synchronization between peripherals. Each of the peripherals is preferably controlled by a dedicated thread that is executed in parallel with minimal interactions and dependencies on the other threads.
A GPU on a conventional video card is usually controlled through OpenGL, DirectX, or similar graphic application programming interfaces (APIs). Such APIs establish the context of graphic operations, within which all calls to the GPU are made. This context only works when initialized within the same thread of execution that uses it. As a result, in a preferred embodiment, the context is initialized within a computational thread. This creates complications, however, in the interaction between the user interface thread that changes parameters of simulations and the computational thread that uses these parameters.
A solution as proposed here is an implementation of the computational stream of execution in hardware, so that thread and context initialization are replaced by hardware initialization. This hardware implementation includes an expansion card comprising a printed circuit board having (a) one or more graphics processing units, (b) two or more associated memory banks that are logically or physically partitioned, (c) a specialized controller, and (d) a local bus providing signal coupling compatible with the PCI industry standards (this includes but is not limited to PCI-Express, PCI-X, USB 2.0, or functionally similar technologies). The controller handles most of the primitive operations needed to set up and control GPU computation. As a result, the CPU is freed from this function and is dedicated to other tasks. In this case a few controls (simulation start and stop signals from the CPU and the simulation completion signal back to CPU), GPU programs and input/output data are the information exchanged between CPU and the expansion card. Moreover, since on every time step of the simulation the results from the previous time step are used but not changed, the results are preferably transferred back to CPU in parallel with the computation.
In general, according to one aspect, the invention features a computer system. This system comprises a central processing unit, main memory accessed by the central processing unit, and a video system for driving a video monitor in response to the central processing unit as is common. The computer system further comprises an accelerator that uses input data from and provides output data to the central processing unit. This accelerator comprises at least one graphics processing unit, accelerator memory for the graphic processing unit, and an accelerator controller that moves the input data into the at least one graphics processing unit and the accelerator memory to generate the output data.
In the preferred, the central processing unit transfers the input data for a simulation to the accelerator, after which the accelerator executes simulation computations to generate the output data, which is transferred to the central processing unit. Preferably, the accelerator controller dictates an order of execution of instructions to the at least one graphics processing unit. The use of the separate controller enables data transfer during execution such that the accelerator controller transfers output data from the accelerator memory to main memory of the central processing unit.
In the preferred embodiment, the accelerator controller comprises an interface controller that enables the accelerator to communicate over a bus of the computer system with the central processing unit.
In general according to another aspect, the invention also features an accelerator system for a computer system, which comprises at least one graphics processing unit, accelerator memory for the graphic processing unit and an accelerator controller for moving data between the at least one graphics processing unit and the accelerator memory.
In general according to another aspect, the invention also features a method for performing numerical simulations in a computer system. This method comprises a central processing unit loading input data into an accelerator system from main memory of the central processing unit and an accelerator controller transferring the input data to a graphics processing unit with instructions to be performed on the input data. The accelerator controller then transfers output data generated by the graphic processing unit to the central processing unit as output data.
The above and other features of the invention including various novel details of construction and combinations of parts, and other advantages, will now be more particularly described with reference to the accompanying drawings and pointed out in the claims. It will be understood that the particular method and device embodying the invention are shown by way of illustration and not as a limitation of the invention. The principles and features of this invention may be employed in various and numerous embodiments without departing from the scope of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
In the accompanying drawings, reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale; emphasis has instead been placed upon illustrating the principles of the invention. Of the drawings:
FIG. 1 is a schematic diagram illustrating a computer system including the GPU accelerator according to an embodiment of the present invention;
FIG. 2 is block diagram illustrating the architecture for the GPU accelerator according to an embodiment of the present invention;
FIG. 3 is a block/flow diagram illustrating an exemplary implementation of the top level control of the GPU accelerator system;
FIG. 4 is a flow diagram illustrating an exemplary implementation of the bottom level control of the GPU accelerator system that is used to execute the target computation; and
FIG. 5 is an example population of nine computational elements arranged in a 3×3 square and a potential packing scheme for texture pixels, according to an implementation of the present invention.
DETAILED DESCRIPTION
FIG. 1 shows a computer system100 that has been constructed according to the principles of the present invention.
In more detail, the computer system100 in one example is a standard personal computer (PC). However, this only serves as an example environment as computing environment100 does not necessarily depend on or require any combination of the components that are illustrated and described herein. In fact, there are many other suitable computing environments for this invention, including, but not limited to, workstations, server computers, supercomputers, notebook computers, hand-held electronic devices such as cell phones, mp3 players, or personal digital assistants (PDAs), multiprocessor systems, programmable consumer electronics, networks of any of the above-mentioned computing devices, and distributed computing environments that including any of the above-mentioned computing devices.
In one implementation the GPU accelerator is implemented as anexpansion card180 includes connections with the motherboard110, on which the one or more CPU's120 are installed along with main, or system memory130 and mass/nonvolatile data storage140, such as hard drive or redundant array of independent drives (RAID) array, for the computer system100. In the current example, theexpansion card180 communicates to the motherboard110 via a local bus190. This local bus190 could be PCI, PCI Express, PCI-X, or any other functionally similar technology (depending upon the availability on the motherboard110). An external version GPU accelerator is also a possible implementation. In this example, the external GPU accelerator is connected to the motherboard110 through USB-2.0, IEEE 1394 (Firewire), or similar external/peripheral device interface.
TheCPU120 and the system memory130 on the motherboard110 and the massdata storage system140 are preferably independent of theexpansion card180 and only communicate with each other and theexpansion card180 through thesystem bus200 located in the motherboard110. Asystem bus200 in current generations of computers have bandwidths from 3.2 GB/s (Pentium 4 with AGTL+, Athlon XP with EV6) to around 15 GB/s (Xeon Woodcrest with AGTL+, Athlon 64/Opteron with Hypertransport), while the local bus has maximal peak data transfer rates of 4 GB/s (PCI Express 16) or 2 GB/s (PCI-X 2.0). Thus the local bus190 becomes a bottleneck in the information exchange between thesystem bus200 and theexpansion card180. The design of the expansion card and methods proposed herein minimizes the data transfer through the local bus190 to reduce the effect of this bottleneck.
The system memory130 is referred to as the main random-access memory (RAM) in the description herein. However, this is not intended to limit the system memory130 to only RAM technology. Other possible computer storage media include, but are not limited to ROM, EEPROM, flash memory, or any other memory technology.
In the illustrated example, the GPU accelerator system is implemented on anexpansion card180 on which the one or more GPU's240 are mounted. It should be noted that the GPUaccelerator system GPU240 is separate from and independent of any GPU on the standard video card150 or other video driving hardware such as integrated graphics systems. Thus the computations performed on theexpansion card180 do not interfere with graphics display (including but not limited to manipulation and rendering of images).
Various brand of GPU are relevant. Under current technology, GPU's based on the GeForce series from NVIDIA Corporation or the Catalyst series from ATI/Advanced Micro Devices, Inc.
The output to a video monitor170 is preferably through the video card150 and not theGPU accelerator system180. The video card150 is dedicated to the transfer of graphical information and connects to the motherboard110 through alocal bus160 that is sometimes physically separate from the local bus190 that connects theexpansion card180 to the motherboard110.
FIG. 2 is a block diagram illustrating the general architecture of the GPU accelerator system and specifically theexpansion card180 in which at least oneGPU240 and associatedmemories210 and250 are mounted. Electrical (signal) and mechanical coupling with a local bus190 provides signal coupling compatible with the PCI industry standards (this includes but is not limited to PCI, PCI-X, PCI Express, or functionally similar technology).
The GPU accelerator further preferably comprises one specifically designed accelerator controller220. Depending upon the implementation, the accelerator controller220 is field programmable gate array (FPGA) logic, or custom built application-specific (ASIC) chip mounted in theexpansion card180, and in mechanical and signal coupling with theGPU240 and the associatedmemories210 and250. During initial design, a controller can be partially or even fully implemented in software, in one example.
The controller220 commands the storage and retrieval of arrays of data (on a conventional video card the arrays of data are represented as textures, hence the term ‘texture’ in this document refers to a data array unless specified otherwise and each element of the texture is a pixel of color information), execution of GPU programs (on a conventional video card these programs are called shaders, hence the term ‘shader’ in this document refers to a GPU program unless specified otherwise), and data transfer between thesystem bus200 and theexpansion card180 through the local bus190 which allows communication between themain CPU120, RAM130, anddisk140.
Twomemory banks210 and250 are mounted on theexpansion card180. In some example, these memory banks separated in the hardware, as shown, or alternatively implemented as a single, logically partitioned memory component.
The reason to separate the memory into twopartitions210250 stems from the nature of the computations to which the GPU accelerator system is applied. The elements of computation (computational elements) are characterized by a single output variable. Such computational elements often include one or more equations. Computational elements are same or similar within a large population and are computed in parallel. An example of such a population is a layer of neurons in an artificial neural network (ANN), where all neurons are described by the same equation. As a result, some data and most of the algorithms are common to all computational elements within population, while most of the data and some algorithms are specific for each equation. Thus, one memory, theshader memory bank210, is used to store the shaders needed for the execution of the required computations and the parameters that are common for all computational elements and is coupled with the controller220 only. The second memory, the texture memory bank250, is used to store all the necessary data that are specific for every computational element (including, but not limited to, input data, output data, intermediate results, and parameters) and is coupled with both the controller220 and theGPU240.
The texture memory bank250 is preferably further partitioned into four sections. The first partition250 a is designed to hold the external input data patterns. The second partition250b is designed to hold the data textures representing internal variables. The third partition250c is designed to hold the data textures used as input at a particular computation step on theGPU240. Thefourth partition250d holds the data textures used to accommodate the output of a particular computational step on theGPU240. This partitioning scheme can be done logically, does not require hardware implementation. Also the partitioning scheme is also altered based on new designs or needs of the algorithms being employed. The reason for this partitioning is further explained in the Data Organization section, below.
Alocal bus interface230 on the controller220 serves as a driver that allows the controller220 to communicate through the local bus190 with thesystem bus200 and thus theCPU120 and RAM130. Thislocal bus interface230 is not intended to be limited to PCI related technology. Other drivers can be used to interface with comparable technology as a local bus190.
Data Organization
Each computational element discussed above has output variables that affect the rest of the system. For example in the case of a neural network it is the output of a neuron. A computational element also usually has several internal variables that are used to compute output variables, but are not exposed to the rest of the system, not even to other elements of the same population, typically. Each of these variables is represented as a texture. The important difference between output variables and internal variables is their access.
Output variables are usually accessed by any element in the system during every time step. The value of the output variable that is accessed by other elements of the system corresponds to the value computed on the previous, not the current, time step. This is realized by dedicating two textures to output variables—one holds the value computed during the previous time step and is accessible to all computational elements during the current time step, another is not accessible to other elements and is used to accumulate new values for the variable computed during the current time step. In-between time steps these two textures are switched, so that newly accumulated values serve as accessible input during the next time step, while the old input is replaced with new values of the variable. This switch is implemented by swapping the address pointers to respective textures as described in the System and Framework section.
Internal variables are computed and used within the same computational element. There is no chance of a race condition in which the value is used before it is computed or after it has already changed on the next time step because within an element the processing is sequential. Therefore, it is possible to render the new value of internal variable into the same texture where the old was read from in the texture memory bank. Rendering to more than one texture from a single shader is not implemented in current GPU architectures, so computational elements that track internal variables would have to have one shader per variable. These shaders can be executed in order with internal variables computed first, followed by output variables.
Further savings of texture memory is achieved through using multiple color components per pixel (texture element) to hold data. Textures can have up to four color components that are all processed in parallel on a GPU. Thus, to maximize the use of GPU architecture it is desirable to pack the data in such a way that all four components are used by the algorithm. Even though each computational element can have multiple variables, designating one texture pixel per element is ineffective because internal variables require one texture and output variables require two textures. Furthermore, different element types have different numbers of variables and unless this number is precisely a multiple of four, texture memory can be wasted.
A more reasonable packing scheme would be to pack four computational elements into a pixel and have separate textures for every variable associated with each computational element. In this case the packing scheme is identical for all textures, and therefore can be accessed using the same algorithm. Several ways to approach this packing scheme are outlined here. An example population of nine computational elements arranged in a 3×3 square (FIG. 5a) can be packed by element (FIG. 5b), by row (FIG. 5c), or by square (FIG. 5d).
Packing by element (FIG. 5b) means that elements1,2,3,4 go into first pixel;5,6,7,8 go into second pixel;9 goes into third pixel. This is the most compact scheme, but not convenient because the geometrical relationship is not preserved during packing and its extraction depends on the size of the population.
Packing by row (column;FIG. 5c) means that elements1,2,3 go into pixel (1,1);3,4,5 go into pixel (2,1),7,8,9 go into pixel (3,1). With this scheme the element's y coordinate in the population is the pixel's y coordinate, while the element's x coordinate in the population is the pixel's x coordinate times four plus the index of color component. Five by five populations in this case will use 2×5 texture, or 10 pixels. Five of these pixels will only use one out of four components, so it wastes 37.5% of this texture. 25×1 population will use 6×1 texture (six pixels) and will waste 12.5% of it.
Packing by square (FIG. 5d) means that elements1,2,4,5 go into pixel (1,1);3,6 go into pixel (1,2);7,8 go into pixel (2,1), and9 goes into pixel (2,2). Both the row and the column of the element are determined from the row (column) of the pixel times two plus the second (first) bit of the color component index. Five by five populations in this case will use 3×3 texture, or 9 pixels. Four of these pixels will only use two out of four components, and one will only use one component, so it wastes 34.4% of this texture. This is more advantageous than packing by row, since the texture is smaller and the waste is also lower. 25×1 population on the other hand will use 13×1 texture (thirteen pixels) and waste >50% of it, which is much worse than packing by row.
In order to eliminate waste altogether the population should have even dimensions in the square packing, and it should have a number of columns divisible by four in row packing. Theoretically, the chances are approximately equivalent for both of these cases to occur, so the particular task and data sizes should determine which packing scheme is preferable in each individual case.
The System and Framework
FIG. 3 shows an exemplary implementation of the top level system and method that is used to control the computation. It is a representation of one of several ways in which a system and method for processing numerical techniques can be implemented in the invention described herein and so the implementation is not intended to be limited to the following description and accompanying figure.
The method presented herein includes two execution streams that run on theCPU120—User Interaction Stream302 andData Output Stream301. These two streams preferably do not interact directly, but depend on the same data accumulated during simulations. They can be implemented as separate threads with shared memory access and executed on different CPUs in the case of multi-CPU computing environment. The third execution stream—Computational Stream303—runs on the GPU accelerator of theexpansion card180 and interacts with the User Interaction Stream302 through initialization routines and data exchange in between simulations. TheComputational Stream303 interacts with the User Interaction Stream and the Data Output Stream through synchronization procedures during simulations.
The crucial feature of the interaction between the User Interaction Stream302 and theComputational Stream303 is the shift of priorities. Outside of the simulation, the system100 is driven by the user input, thus the User Interaction Stream302 has the priority and controls thedata exchange304 between streams. After the user starts the simulation, theComputational Stream303 takes the priority and controls the data exchange between streams until the simulation is finished or interrupted350.
The user starts300 the framework through the means of an operating system and interacts with the software through theuser interaction section305 of thegraphic user interface306 executed on theCPU120. Thestart300 of the implementation begins with a user action that causes aGUI initialization307, Disk input/output initialization308 on theCPU120, andcontroller initialization320 of the GPU accelerator on theexpansion card180. GUI initialization includes opening of the main application window and setting the interface tools that allow the user to control the framework. Disk I/O initialization can be performed at the start of the framework, or at the start of each individual simulation.
Theuser interaction305 controls the setting and editing of the computational elements, parameters, and sources of external inputs. It specifies which equations should have their output saved to disk and/or displayed on the screen. It allows the user to start and stop the simulation. And it performs standard interface functions such as file loading and saving, interactive help, general preferences and others.
Theuser interaction305 directs theCPU120 to acquire the new external input textures needed (this includes but is not limited to loading fromdisk140 or receiving them in real time from a recording device), parses them if necessary309, and initializes their transfer to theexpansion card180, where they are stored325 in the texture memory bank250 by the controller220. Theuser interaction305 also directs theCPU120 to parse populations of elements that will be used in the simulation, convert them to GPU programs (shaders), compile them310, and initializes their transfer to theexpansion card180, where they are stored326 in theshader memory bank210 by the controller220. This operation is accompanied by the upload309 of the initial data into the input partition of the texture memory bank250, and stores the shader order of execution in the controller220. The user can performoperations309 and310 as many times as necessary prior to starting the simulation or between simulations.
The editing of the system between simulations is difficult to accomplish without the hardware implementation of the computational thread suggested herein. The system of equations (computational elements) is represented by textures that track variables plus shaders that define processing algorithms. As mentioned above, textures, shaders and other graphics related constructs can only be initialized within the rendering context, which is thread specific. Therefore textures and shaders can only be initialized in the computational thread.
Network editing is a user-interactive process, which according to the scheme suggested above happens in the User Interaction Stream302. The simulation software thus has to take the new parameters from the User Interaction Stream302, communicate them to theComputational Stream303 and regenerate the necessary shaders and textures. This is hard to accomplish without a hardware implementation of theComputational Stream303. TheComputational Stream303 is forked from the User Interaction Stream and it can access the memory of the parent thread, but the reverse communication is harder to achieve. The controller220 allowsoperations309 and310 to be performed as many times as necessary by providing the necessary communication to the User Interaction Stream302.
After execution of the inputparser texture generation309 and population parser shader generator andcompiler310 are performed at least once, the user has the option to initialize thesimulation311. During this initialization the main control of the framework is transferred to the GPU accelerator system's accelerator controller220 andcomputation330 is started (seeFIG. 4;420). The user retains the ability to interrupt the simulation, change the input, or to change the display properties of the framework, but these interactions are queued to be performed at times determined by the controller-drivendata exchange314 and316 to avoid the corruption of the data.
The progress monitor312 is not necessary for performance, but adds convenience. It displays the percentage of completed time steps of the simulation and allows the user to plan the schedule using the estimates of the simulation wall clock times. Controller-drivendata exchange314 updates the display of theresults313. Online screen output for the user selected population allows the user to monitor the activity and evaluate the qualitative behavior of the network. Simulations with unsatisfactory behavior can be terminated early to change parameters and restart. Controller-drivendata exchange314 also drives the output of the results todisk317. Data output to disk for convenience can be done on an element per file basis. A suggested file format includes a leftmost column that displays a simulated time for each of the simulation steps and subsequent columns that display variable values during this time step in all elements with identical equations (e.g. all neurons in a layer of a neural network).
Controller-driven data exchange or inputparser texture generator316 allows the user to change input that is generated on the fly during the simulation. This allows the framework monitoring of the input that is coming from a recording device (video camera, microphone, cell recording electrode, etc) in real time. Similar to theinitial input parser309, it preprocesses the input into a universal format of the data array suitable for texture generation and generates textures. Unlike theinitial parser309, here the textures are transferred to hardware not whenever ready but upon the request of the controller220.
The controller220 also drives theconditional testing315 and318 informs the CPU-bound streams whether the simulation is finished. If so, the control returns to the User Interaction Stream. The user then can change parameters or inputs (309 and310), restart the simulation (311) or quit the framework (390).
SANNDRA (Synchronous Artificial Neuronal Network Distributed Runtime Algorithm; http://www.kinness.net/Docs/SANNDRA/html) was developed to accelerate and optimize processing of numerical integration of large non-homogenous systems of differential equations. This library is fully reworked in its version 2.x.x to support multiple computational backends including those based on multicore CPUs, GPUs and other processing systems. GPU based backend for SANNDRA-2.x.x can serve as an example practical software implementation of the method and architecture described above and pictorially represented inFIG. 3.
To use SANNDRA, the application should create a TSimulator object either directly or through inheritance. This object will handle global simulation properties and control the User Interaction Stream, Data Output Stream, and Computational Stream. Through TSimulator::timestep( ), TSimulator::outfileInterval( ), and TSimulator::outmode( ), the application can set the time step of the simulation, the time step of disk output, and the mode of the disk output. The external input pattern should be packed into a TPattern object and bound to the simulation object through TSimulator::resetInputs( ) method. TSimulator::simLength( ) sets the length of the simulation.
The second step is to create at least one population of equations (Tpopulation object). Population holds one equation object TEquation. This object contains only a formula and does not hold element-specific data, so all elements of the population can share single TEquation.
The TEquation object is converted to a GPU program before execution. GPU programs have to be executed within a graphical context, which is stream specific. TSimulator creates this context within a Computational Stream, therefore all programs and data arrays that are necessary for computation have to be initialized within Computational Stream. Constructor of TPopulation is called from User Interaction Stream, so no GPU-related objects can be initialized in this constructor.
TPopulation::fillElements( ) is a virtual method designed to overcome this difficulty. It is called from within the Computational Stream after TSimulator::networkCreate( ) is called in the User Interaction Stream. A user has to override TPopulation::fillElements( ) to create TEquation and other computation related objects both element independent and element-specific. Element independent objects include subcomponents of TEquation and objects that describe how to handle interdependencies between variables implemented through derivatives of TGate class.
Element-specific data is held in TElement objects. These objects hold references to TEquation and a set of TGate objects. There is one TElement per population, but the size of data arrays within this object corresponds to population size. All TElement objects have to be added to the TSimulator list of elements by calling TSimulator::addUnit( ) method from TPopulation::fillElements( ).
Finally, TPopulation::fillElements( ) should contain a set of TElement::add*Dependency( ) calls for each element. Each of these calls sets a corresponding dependency for every TGate object. Here TGate object holds element independent part of dependency and TElement::add*Dependency( ) sets element-specific details.
System provided TPopulation handles the output of computational elements, both when they need to exchange the data and when they need to output it to disk. User implementation of TPopulation derivative can add screen output.
Listing 1 is an example code of the user program that uses a recurrent competitive field (RCF) equation:
Listing 1
uint16_t w = 3, h = 3;
static float m_compet = 0.5;
static float m_persist = 1.0;
class TCablePopRCF : public TPopulation
{
TEq_RCF* m_equation;
TGate* m_gatel;
TGate* m_gate2;
void createGatingStructure( )
{
m_gatel = new TGate(0);
m_gate2 = new TGate(1);
};
void createUnitStructure(TBasicUnit* u)
{
u->addO2OPInputDependency(m_gatel, 0., 0., 0.004, 0., 0, 0);
u->addFullDependency(m_gate2, population( ));
}
public: TCablePopRCF( ) : TPopulation(“compCPU RCF”, w, h, true) { };
~TCablePopRCF( ) {if(m_equation) delete m_equation;
 if(m_gatel) delete m_gatel;
 if(m_gate2) delete m_gate2;};
bool fillElements(TSimulator* sim);
};
bool TCablePopRCF::fillElements(TSimulatior* sim)
{
m_equation = new TEq_RCF(this, m_compet, m_persist);
createGatingStructure( );
for(size_t i = 0; i < xSize( ); ++i)
for(size_t j = 0; j < ySize( ); ++j)
{
TElement* u = new TCPUElement(this, m_equation, i, j);
sim->addUnit(u);
createUnitStructure(u);
}
Return true;
}
int
main( )
{
// Input pattern generation (309 in FIG.3)
uint32_t* pat = new uint32_t[w*h];
TRandom<float> randGen (0);
for(uint32_t I = 0; I < w*h; ++i)
pat[i] = randGen.random( );
Tpattern* p = new Tpattern(pat, w, h);
// Setting up the simulation
TSimulator* cableSim = new TSimulator(“data”); //(308 and 320 in
FIG. 3)
cableSim->timestep(0.05); //(320 in FIG. 3)
cableSim->resetInputs(p); //(325 in FIG. 3)
cableSim->outfileInterval(0.1); //(308 in FIG. 3)
cableSim->outmode(SANNDRA::timefunc); //(308 in FIG. 3)
cableSim->simLength(60.0); //(320 in FIG. 3)
// Preparing the population
TPopulation* cablePop = new TCablePopRCF( ); //(310 in FIG. 3)
cableSim->networkCreate( ); //(326 in FIG. 3)
uint16_t user= 1;
while(user)
{
if(! cableSim->simulationStart(true, 1)) //(311 in FIG. 3)
exit(1);
std::cout<<“Repeat?\n”; //(305 in FIG. 3)
std::cin>>user; //(305 in FIG. 3)
if(user == 1)
cableSim->networkReset( ); //(305 in FIG. 3)
{
If(cableSim)
Delete cableSim; //Also deletes cablePop and its internals
exit(0);
};
FIG. 4 is a detailed flow diagram illustrating a part of an exemplary implementation of the bottom level system and method performed during the computation on the GPU accelerator of theexpansion card180 and is a more detailed view of thecomputational box330 inFIG. 3.FIG. 4 is a representation of one of several ways in which a system and method for processing numerical techniques can be implemented.
With systems of equations that have complex interdependencies it is likely that the variable in some equation from a previous time step has to be used by some other equation after the new values of this variable are already computed for new time step. To avoid data confusion, the new values of variables should be rendered in a separate texture. After the time step is completed for all equations, these new values should be copied over old values so that they are used as input during the next time step. Copying textures is an expensive operation, computationally, but since the textures are referred to by texture IDs (pointers), swapping these pointers for input and output textures after each time step achieves the same result at a much lesser cost.
In the hardware solution suggested herein, ID swapping is equivalent to swapping the base memory address for two partitions of the texture memory bank250. They are swapped485 during synchronization (485,430, and455) so that data transfer445 and the computation435-487 proceeds immediately and in parallel with data transfer as shown inFIG. 4. A hardware solution allows this parallelism through access of the controller220 to the onboard texture memory bank250.
The main computation and data exchange are executed by the controller220. It runs three parallel substreams of execution:Computational Substream403, Data Output Substream402, and Data Input Substream404. These streams are synchronized with each other during the swap of pointers485 to the input and output texture memory partitions of the texture memory bank250 and the check for the last iteration487. Algorithmically, these two operations are a single atomic operation, but the block diagram shows them as two separate blocks for clarity.
TheComputational Substream403 performs a computational cycle including a sequential execution of all shaders that were stored in theshader memory bank210 using the appropriate input and output textures. To begin the simulation the controller220 initializes threeexecution substreams403,402, and404. On every simulation step, theComputational Substream403 determines which textures theGPU240 will need to perform the computations and initiates the upload435 of them onto theGPU240. TheGPU240 can communicate directly with the texture memory bank250 to upload the appropriate texture to perform the computations. The controller220 also pulls the first shader (known by the stored order) from theshader memory bank210 and uploads450 it onto theGPU240.
TheGPU240 executes the following operations in this order: performs the computation (execution of the shader)470; tells the controller220 that it is done with the computations for the current shader; and after all shaders for this particular equation are executed sends480 the output textures to the output portion of the texture memory bank250. This cycle continues through all of the equations based on the branching step482.
An example shader that performs fourth order Runge-Kutta numerical integration is shown in Listing 2 using GLSL notation;
Listing 2
uniform sampler2DRect Variable;
uniform float integration_step;
float halfstep = integration_step*0.5;
float fl_6step = integration_step/6.0;
vec4 output = texture2DRect(Variable, gl_TexCoord[0].st);
// define equation( ) here
vec4 rungekutta4(vec4 x)
{
const vec4 kl = equation(x);
const vec4 k2 = equation(x + halfstep*kl);
const vec4 k3 = equation(x + halfstep*k2);
const vec4 k4 = equation(x + integration step*k3);
return fl_6step*(k1 + 2.0*(k2 + k3) + k4);
}
Void main(void)
{
output += rungekutta4(output);
gl_FragColor = output;
}
The shader in Listing 2 can be executed on conventional video card. Using the controller220 this code can be further optimized, however. Since the integration step does not change during the simulation, the step itself as well as the halfstep and ⅙ of the step can be computed once per simulation, and updated in all shaders by ashader update procedures310,326 discussed above.
After all of the equations in the computational cycle are computed themain execution substream403 on the controller220 can switch485 the reference pointers of the input and output portions of the texture memory bank250.
The two other substreams of execution on the controller220 are waiting (blocks430 and455, respectively) for this switch to begin their execution. The Data Input Substream404 is controlling440 the input of additional data from theCPU120. This is necessary in cases where the simulation is monitoring the changing input, for example input from a video camera or other recording device in the real time. This substream uploads new external input from theCPU120 to the texture memory bank250 so it can be used by the maincomputational substream403 on the next computational step and waits for the next iteration475. The Data Output Substream445 controls the output of simulation results to theCPU120 if requested by the user. This substream uploads the results of the previous step to the main RAM130 so that theCPU120 can save them ondisk140 or show them on the results display313 and waits for the next iteration460.
Since theComputational Substream403 determines the timing of input440 and output445 data transfers, these data transfers are driven by the controller220. To further reduce the data transfer overhead (anddisk140 overhead also) the controller220 initiates transfer only after selected computational steps. For example, if the experimental data that is simulated was recorded every 10 milliseconds (msec) and the simulation for better precision was computed every 1 msec, then only every tenth result has to be transferred to match the experimental frequency.
This solution stores two copies of output data, one in the expansion card texture memory bank250 and another in the system RAM130. The copy in the system RAM130 is accessed twice: for disk I/O andscreen visualization313. An alternative solution would be to provideCPU120 with a direct read access to the onboard texture memory bank250 by mapping the memory of the hardware onto a global memory space. The alternative solution will double the communication through the local bus190. Since the goal discussed herein is reducing the information transfer through the local bus190, the former solution is favored.
Themain substream403 determines if this is the last iteration487. If it is the last iteration, the controller220 waits for the all of the execution substreams to finish490 and then returns the control to theCPU120, otherwise it begins the next computational cycle.
This repeats through all of the computational cycles of the simulation.
CONCLUSION
This GPU accelerator system offers the following potential advantages:
1. Limited computations on theCPU120. TheCPU120 is only used for user input, sending information to the controller220, receiving output after each computational cycle (or less frequently as defined by the user), writing this output todisk140, and displaying this output on the monitor170. This frees theCPU120 to execute other applications and allows the expansion card to run at its full capacity without being slowed down by extensive interactions with theCPU120.
2. Minimizing data transfer between theexpansion card180 and thesystem bus200. All of the information needed to perform the simulations will be stored on theexpansion card180 and all simulations will take place on it. Furthermore, whatever data transfer remains necessary will take place in parallel with the computation, thus reducing the impact of this transfer on the performance.
3. New way to execute GPU programs (shaders). Previously, theCPU120 had full control over the order of shader's execution and was required to produce specific commands on every cycle to tell theGPU240 which shader to use. With the invention disclosed herein, shaders will initially be stored on theshader memory bank210 on theexpansion card180 and will be sent to theGPU240 for execution by the general purpose controller220 located on the expansion card.
4. Multiple parallelisms. TheGPU240 is inherently parallel and is well suited to perform parallel computations. In parallel with theGPU240 performing the next calculation, the controller220 is uploading the data from the previous calculation into main memory130. Furthermore, theCPU120 at the same time uses uploaded previous results to save them ontodisk140 and to display them on the screen through thesystem bus200.
5. Reuse of existing and affordable technology. All hardware used in the invention and mentioned here-in are based on currently available and reliable components. Further advance of these components will provide straightforward improvements of the invention.
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Claims (57)

What is claimed is:
1. A computer system, comprising:
a central processing unit to receive input data;
main memory, operably coupled to the central processing unit via a bus, to store the input data received by the central processing unit;
an accelerator, operably coupled to the central processing unit and the first main memory via the bus, to receive at least a portion of the input data from the main memory, the accelerator comprising:
at least one graphics processing unit to perform a sequence of computations on the at least a portion of the input data so as to generate output data, the sequence of computations representing an artificial neural network, intermediate computations in the sequence of computations representing respective layers of the artificial neural network and yielding intermediate results; and
accelerator memory, operably coupled to the graphic at least one graphics processing unit, to store the results of the plurality of sequential sequence of computations; and
a controller, operably coupled to the at least one graphics processing unit and the accelerator memory, to initialize textures and shaders in the accelerator memory for performing the sequence of computations, to control performance of the sequence of computations by the at least one graphics processing unit, to transfer the at least a portion of the input data into the accelerator memory during performance of the intermediate computations in the sequence of computations by the at least one graphics processing unit, and to transfer at least a portion of the output data from the accelerator memory to the main memory during performance of the intermediate computations in the sequence of computations by the at least one graphic graphics processing unit.
2. The computer system ofclaim 1, wherein the central processing unit is configured to receive the input data in response to a user interaction.
3. The computer system ofclaim 1, wherein:
the central processing unit is configured to receive the input data at a first rate; and
the at least one graphics processing unit is configured to perform the sequence of computations at a second rate different than the first rate.
4. The computer system ofclaim 1, wherein the main memory is configured to store a copy of the output data stored in the accelerator memory.
5. The computer system ofclaim 1, wherein an output of at least one computation in the sequence of computations represents an output of at least one neuron in an artificial neural network.
6. The computer system ofclaim 1, wherein accelerator memory comprises:
a first memory bank to store parameters common to all of the computations in the sequence of computations; and
a second memory bank to store data specific to at least one computation in the sequence of computations.
7. The computer system ofclaim 1, wherein the controller is configured to transfer the output data from the accelerator memory to the main memory without transferring any of the intermediate results from the accelerator memory to the main memory so as to reduce data transfer via the bus.
8. The computer system ofclaim 1, wherein the controller is configured to transfer at least a portion of the output data from the accelerator memory to the main memory after the at least one graphics processing unit has begun to perform another sequence of computations.
9. The computer system ofclaim 8, wherein the controller is configured to initiate transfer of the at least a portion of the input data and to transfer the at least a portion of the output data in parallel with performance of at least one computation in the other sequence of computations by the at least one graphics processing unit.
10. The computer system ofclaim 1, wherein the controller is configured to control execution of the sequence of computations by the at least one graphics processing unit.
11. The computer system ofclaim 1, further comprising:
at least one of a video camera, a microphone, or a cell recording electrode, operably coupled to the central processor processing unit, to acquire the input data in real time.
12. A method of performing a sequence of computations representing an artificial neural network on a computer system comprising a central processing unit (CPU), a main memory operably coupled to the central processing unit via a bus, an accelerator operably coupled to the CPU and the main memory via the bus, the accelerator comprising a graphics processing unit (GPU) and an accelerator memory, the method comprising:
(A) performing, by the GPU, the sequence of computations on a first portion of the input data so as to generate a first portion of the output data, the first portion of the output data representing an output of a neuron in a first layer of the artificial neural network, intermediate computations in the sequence of computations yielding intermediate results, wherein performing the sequence of computations on the first portion of the input data comprises (i) assigning an output variable to a first texture and a second texture, the output variable being included in a first computational element of a plurality of computational elements, the plurality of computational elements representing the sequence of computations and (ii) accumulating a first value for the output variable in the first texture during a first time step;
(B) in parallel with performing the sequence of computations by the GPU in (A), transferring a second portion of the input data from the main memory to the accelerator via the bus; and
(C) in parallel with performing the sequence of computations by the GPU in (A), transferring a second portion of the output data from the accelerator memory to the main memory via the bus, the second portion of the output data representing an output of a neuron in a second layer in the artificial neural network; and
(D) performing, by the GPU, the sequence of computations on the second portion of the input data, wherein performing the sequence of computations on the second portion of the input data comprises (i) accumulating a second value for the output variable in the second texture during a second time step and (ii) making the first value of the output variable in the first texture accessible to other computational elements in the plurality of computational elements during the second time step.
13. The method ofclaim 12, further comprising:
storing the input data in the main memory in response to a user interaction.
14. The method ofclaim 12, further comprising:
receiving the input data at a first rate; and
wherein (A) comprises performing the sequence of computations at a second rate different than the first rate.
15. The method ofclaim 12, wherein (A) comprises:
generating an output representative of an output of at least one neuron in an artificial neural network.
16. The method ofclaim 12, wherein (C) comprises:
transferring the second portion of the output data from the accelerator memory to the main memory without transferring any of the intermediate results of the plurality of sequential computations from the accelerator memory to the main memory so as to reduce data transfer via the bus.
17. The method ofclaim 12, wherein (C) comprises:
transferring the second portion of the output data from the accelerator memory to the main memory after the GPU has begun to perform another sequence of computations.
18. The method ofclaim 17, wherein (C) further comprises:
initiating transfer of the second portion of the output data in parallel with performance of at least one computation in the other sequence of computations.
19. The method ofclaim 12, further comprising:
acquiring the input data in real time with at least one of a video camera, a microphone, or a cell recording electrode operably coupled to the CPU.
20. The method ofclaim 12, further comprising:
storing parameters common to all of the computations in the sequence of computations in a first memory bank in the accelerator memory; and
storing data specific to at least one computation in the sequence of computations in a second memory bank in the accelerator memory.
21. A method of performing a sequence of computations representing an artificial neural network, the method comprising:
receiving, at a central processing unit (CPU), first input data acquired from an external system in real time;
initializing, by a controller operably coupled to a graphics processing unit (GPU), textures and shaders in a memory operably coupled to the GPU;
transferring the first input data received by the CPU to the memory operably coupled to the GPU;
performing, by the graphics processing unit (GPU), a first computation in the sequence of computations on the first input data based on the textures and shaders to generate first output data, computations in the sequence of computations representing respective layers of neurons in the artificial neural network, an output of the first computation in the sequence of computations representing an output of a first neuron in a first layer in the artificial neural network;
storing, in the memory operably coupled to the GPU, the first input data and the first output data; and
transferring second input data acquired from the external system in real time into the memory operably coupled to the GPU after the GPU starts the first computation and before the GPU starts a second computation of the sequence of computations, an output of the second computation in the sequence of computations representing an output of a second neuron in a second layer in the artificial neural network.
22. The method of claim 21, wherein transferring the second input data comprises transferring the second input data via a bus operably coupled to the CPU.
23. The method of claim 21, further comprising:
transferring the first output data from the memory to another memory during the second computation in the sequence of computations.
24. The method of claim 23, further comprising:
storing intermediate results of the sequence of computations in the memory, and
wherein transferring the first output data from the memory to the other memory occurs without transferring the intermediate results of the sequence of computations.
25. The method of claim 23, wherein transferring the second input data and transferring the first output data occurs in parallel.
26. The method of claim 21, further comprising:
storing, in a first memory partition of the memory, parameters common to all of the computations in the sequence of computations.
27. The method of claim 26, further comprising:
storing, in a second memory partition of the memory, data specific to the first computation in the sequence of computations.
28. The method of claim 27, further comprising:
storing, in the second memory partition, external input data patterns, representations of internal variables, an input of the computation in the sequence of computations, and the output of the computation in the sequence of computations.
29. The method of claim 21, wherein storing the first output data comprises:
accumulating, in the memory, outputs of computational elements executed by the GPU in performing the first computation in the sequence of computations.
30. The method of claim 21, further comprising:
storing, in the memory, an output of a previous computation in the sequence of computations; and
accessing, by the GPU, the output of the previous computation during performance of the computation in the sequence of computations.
31. The method of claim 21, wherein performing the first computation comprises executing a plurality of computational elements representing a layer of neurons in an artificial neural network.
32. The method of claim 31, wherein all neurons in the layer of neurons are described by the same equation.
33. The method of claim 21, further comprising:
acquiring the second input data with at least one of a video camera, a microphone, or a cell recording electrode.
34. The method of claim 21, further comprising:
loading the second input data from disk.
35. A system for performing a sequence of computations, the system comprising:
a camera to generate input data in real time;
a first memory partition;
a second memory partition operably coupled to the first memory partition; and
a processing unit, operably coupled to the camera, the first memory partition, and the second memory partition, to perform the sequence of computations on a first portion of the input data so as to generate a first portion of output data, intermediate computations in the sequence of computations yielding intermediate results, the first portion of the output data representing an output of an artificial neural network,
wherein the first memory partition is configured to transfer a second portion of the input data to the second memory partition in parallel with performance the sequence of computations by the processing unit,
wherein the second memory partition is configured to transfer a second portion of the output data to the first memory partition in parallel with performance the sequence of computations by the processing unit, and
wherein the sequence of computations represents the artificial neural network, each neuron in the artificial neural network has an output variable assigned to a first texture and a second texture in the memory, the first texture holds a first value of the output variable computed during a previous time step of the sequence of computations and accessible to other neurons in the neural network during a current time step of the sequence of computations and the second texture accumulates a second value of the output variable computed during the current time step.
36. The system of claim 35, wherein the first memory partition and the second memory partition are logical partitions.
37. The system of claim 35, wherein the processing unit is comprises a graphics processing unit (GPU).
38. The system of claim 35, wherein the processing unit is configured to receive the input data at a first rate and to perform the sequence of computations at a second rate is different than the first rate.
39. The system of claim 35, wherein the second memory partition is configured to transfer the second portion of the output data to the first memory partition without transferring any of the intermediate results to the first memory partition.
40. A system for executing an artificial neural network, the system comprising:
a central processing unit (CPU) to provide first input data;
a memory, operably coupled to the CPU, to store the first input data in a first partition, referenced by a first pointer, before computing a first layer of neurons of the artificial neural network;
a processing unit, operably coupled to the memory, to perform, during computation of the first layer of neurons, at least one calculation on the first input data so as to generate first output data, the first output data representing an output of at least one neuron in the first layer of neurons; and
a controller, operably coupled to the processing unit and the memory, to:
store the first output data in a second partition of the memory, the second partition referenced by a second pointer, and to swap the first pointer with the second pointer at the end of the computation of the first layer of neurons, such that the first output data becomes an input for a second layer of neurons of the artificial neural network,
transfer the first output data to another memory during computation of the second layer of neurons, and
dictate an order of execution of instructions to the processing unit to perform the computation of the first layer of neurons.
41. The system of claim 40, wherein the processing unit comprises a graphics processing unit.
42. The system of claim 40, wherein the controller is configured to send instructions for performing the at least one calculation to the processing unit.
43. The system of claim 40, wherein the memory further comprises:
a third partition to store internal variables; and
a fourth partition to store data used as input at a particular layer of neurons of the artificial neural network.
44. A computer system, comprising:
a central processing unit to receive input data acquired from an external system;
main memory, operably coupled to the central processing unit via a bus, to store the input data received by the central processing unit;
an accelerator, operably coupled to the central processing unit and the main memory via the bus, to receive at least a portion of the input data from the main memory, the accelerator comprising:
at least one processing unit to perform a sequence of computations representing an artificial neural network on the at least a portion of the input data so as to generate output data, intermediate computations in the sequence of computations representing layers of the neural network and yielding intermediate results; and
accelerator memory, operably coupled to the at least one processing unit, to store the results of the sequence of computations; and
a controller, operably coupled to the at least one processing unit and the accelerator memory, to control transfer of the at least a portion of the input data into the accelerator memory during performance of the intermediate computations in the sequence of computations by the at least one processing unit, to control transfer at least a portion of the output data from the accelerator memory to the main memory during performance of the intermediate computations in the sequence of computations by the at least one processing unit, and to control performance of the sequence of computations by the at least one processing unit.
45. The computer system of claim 44, wherein the central processing unit is configured to receive the input data in response to a user interaction.
46. The computer system of claim 44, wherein:
the central processing unit is configured to receive the input data at a first rate; and
the at least one processing unit is configured to perform the sequence of computations at a second rate different than the first rate.
47. The computer system of claim 44, wherein the main memory is configured to store a copy of the output data stored in the accelerator memory.
48. The computer system of claim 44, wherein an output of at least one computation in the sequence of computations represents an output of at least one neuron in an artificial neural network.
49. The computer system of claim 44, wherein accelerator memory comprises:
a first memory partition to store parameters common to all of the computations in the sequence of computations; and
a second memory partition to store data specific to at least one computation in the sequence of computations.
50. The computer system of claim 44, wherein the controller is configured to transfer the output data from the accelerator memory to the main memory without transferring any of the intermediate results from the accelerator memory to the main memory so as to reduce data transfer via the bus.
51. The computer system of claim 44, wherein the controller is configured to transfer at least a portion of the output data from the accelerator memory to the main memory after the at least one processing unit has begun to perform another sequence of computations.
52. The computer system of claim 51, wherein the controller is configured to initiate transfer of the at least a portion of the input data and to transfer the at least a portion of the output data in parallel with performance of at least one computation in the other sequence of computations by the at least one processing unit.
53. The computer system of claim 44, wherein the controller is configured to control execution of the sequence of computations by the at least one processing unit.
54. The computer system of claim 44, further comprising:
at least one of a video camera, a microphone, or a cell recording electrode, operably coupled to the central processing unit, to acquire the input data in real time.
55. The computer system of claim 1, wherein the controller is configured to inform the central processing unit that the sequence of computations is finished.
56. The computer system of claim 1, wherein the controller is configured to reduce a processing load on the central processing unit.
57. The computer system of claim 1, wherein the controller is configured to reduce interactions between the central processing unit and the accelerator.
US15/808,2012006-09-252017-11-09Graphic processor based accelerator system and methodActiveUSRE48438E1 (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
US15/808,201USRE48438E1 (en)2006-09-252017-11-09Graphic processor based accelerator system and method
US17/136,343USRE49461E1 (en)2006-09-252020-12-29Graphic processor based accelerator system and method

Applications Claiming Priority (4)

Application NumberPriority DateFiling DateTitle
US82689206P2006-09-252006-09-25
US11/860,254US8648867B2 (en)2006-09-252007-09-24Graphic processor based accelerator system and method
US14/147,015US9189828B2 (en)2006-09-252014-01-03Graphic processor based accelerator system and method
US15/808,201USRE48438E1 (en)2006-09-252017-11-09Graphic processor based accelerator system and method

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
US14/147,015ReissueUS9189828B2 (en)2006-09-252014-01-03Graphic processor based accelerator system and method

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US14/147,015ContinuationUS9189828B2 (en)2006-09-252014-01-03Graphic processor based accelerator system and method

Publications (1)

Publication NumberPublication Date
USRE48438E1true USRE48438E1 (en)2021-02-16

Family

ID=39416485

Family Applications (4)

Application NumberTitlePriority DateFiling Date
US11/860,254Active2030-07-20US8648867B2 (en)2006-09-252007-09-24Graphic processor based accelerator system and method
US14/147,015CeasedUS9189828B2 (en)2006-09-252014-01-03Graphic processor based accelerator system and method
US15/808,201ActiveUSRE48438E1 (en)2006-09-252017-11-09Graphic processor based accelerator system and method
US17/136,343ActiveUSRE49461E1 (en)2006-09-252020-12-29Graphic processor based accelerator system and method

Family Applications Before (2)

Application NumberTitlePriority DateFiling Date
US11/860,254Active2030-07-20US8648867B2 (en)2006-09-252007-09-24Graphic processor based accelerator system and method
US14/147,015CeasedUS9189828B2 (en)2006-09-252014-01-03Graphic processor based accelerator system and method

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US17/136,343ActiveUSRE49461E1 (en)2006-09-252020-12-29Graphic processor based accelerator system and method

Country Status (1)

CountryLink
US (4)US8648867B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
USRE49461E1 (en)*2006-09-252023-03-14Neurala, Inc.Graphic processor based accelerator system and method

Families Citing this family (46)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8359281B2 (en)*2008-06-042013-01-22Nec Laboratories America, Inc.System and method for parallelizing and accelerating learning machine training and classification using a massively parallel accelerator
US20100238188A1 (en)*2009-03-202010-09-23Sean MiceliEfficient Display of Virtual Desktops on Multiple Independent Display Devices
US8922566B2 (en)*2010-06-282014-12-30Nvidia CorporationRechargeable universal serial bus external graphics device and method
CN103106637A (en)*2011-11-112013-05-15辉达公司Standard central processing unit (CPU) module, system containing CPU module and method for driving system
US20130163195A1 (en)*2011-12-222013-06-27Nvidia CorporationSystem, method, and computer program product for performing operations on data utilizing a computation module
CN102541804B (en)*2011-12-262014-04-02中国人民解放军信息工程大学Multi-GPU (graphic processing unit) interconnection system structure in heterogeneous system
CN103049421B (en)*2012-12-112019-08-27百度在线网络技术(北京)有限公司Data transmission method and device between a kind of CPU and coprocessor
WO2014204615A2 (en)2013-05-222014-12-24Neurala, Inc.Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence
EP2999940A4 (en)2013-05-222017-11-15Neurala Inc.Methods and apparatus for early sensory integration and robust acquisition of real world knowledge
CN103678234B (en)*2013-12-062016-11-02福建鑫诺通讯技术有限公司A kind of interface circuit of the multiple usb type of compatibility
US9626566B2 (en)2014-03-192017-04-18Neurala, Inc.Methods and apparatus for autonomous robotic control
WO2015143173A2 (en)2014-03-192015-09-24Neurala, Inc.Methods and apparatus for autonomous robotic control
US9870333B1 (en)2014-09-122018-01-16Keysight Technologies, Inc.Instrumentation chassis including integrated accelerator module
CN104267940A (en)*2014-09-172015-01-07武汉狮图空间信息技术有限公司Quick map tile generation method based on CPU+GPU
CN104331271A (en)*2014-11-182015-02-04李桦Parallel computing method and system for CFD (Computational Fluid Dynamics)
CN105740493A (en)*2014-12-122016-07-06鸿富锦精密工业(武汉)有限公司Simulation model and simulation method for obtaining cooling flow of expansion card
US11157800B2 (en)2015-07-242021-10-26Brainchip, Inc.Neural processor based accelerator system and method
WO2017049583A1 (en)*2015-09-252017-03-30Intel CorporationGpu-cpu two-path memory copy
EP3996382A1 (en)*2015-10-022022-05-11Twitter, Inc.Gapless video looping
WO2017203096A1 (en)*2016-05-272017-11-30Picturall OyA computer-implemented method for reducing video latency of a computer video processing system and computer program product thereto
US11551028B2 (en)2017-04-042023-01-10Hailo Technologies Ltd.Structured weight based sparsity in an artificial neural network
US11615297B2 (en)2017-04-042023-03-28Hailo Technologies Ltd.Structured weight based sparsity in an artificial neural network compiler
US11238334B2 (en)2017-04-042022-02-01Hailo Technologies Ltd.System and method of input alignment for efficient vector operations in an artificial neural network
US12430543B2 (en)2017-04-042025-09-30Hailo Technologies Ltd.Structured sparsity guided training in an artificial neural network
US11544545B2 (en)2017-04-042023-01-03Hailo Technologies Ltd.Structured activation based sparsity in an artificial neural network
US10387298B2 (en)2017-04-042019-08-20Hailo Technologies LtdArtificial neural network incorporating emphasis and focus techniques
US10838902B2 (en)*2017-06-232020-11-17Facebook, Inc.Apparatus, system, and method for performing hardware acceleration via expansion cards
CN107729283B (en)*2017-10-102021-07-13惠州Tcl移动通信有限公司 A method, system and storage medium for controlling CPU expansion based on mobile terminal
US12067466B2 (en)2017-10-192024-08-20Pure Storage, Inc.Artificial intelligence and machine learning hyperscale infrastructure
US10360214B2 (en)2017-10-192019-07-23Pure Storage, Inc.Ensuring reproducibility in an artificial intelligence infrastructure
US11861423B1 (en)2017-10-192024-01-02Pure Storage, Inc.Accelerating artificial intelligence (‘AI’) workflows
US11494692B1 (en)2018-03-262022-11-08Pure Storage, Inc.Hyperscale artificial intelligence and machine learning infrastructure
US11455168B1 (en)2017-10-192022-09-27Pure Storage, Inc.Batch building for deep learning training workloads
US10671434B1 (en)2017-10-192020-06-02Pure Storage, Inc.Storage based artificial intelligence infrastructure
US10564989B2 (en)2017-11-282020-02-18Microsoft Technology LicensingThread independent parametric positioning for rendering elements
US10424041B2 (en)*2017-12-112019-09-24Microsoft Technology Licensing, LlcThread independent scalable vector graphics operations
RU199766U1 (en)*2019-12-232020-09-21Общество с ограниченной ответственностью "Эверест" PCIe EXPANSION CARD FOR CONTINUOUS PERFORMANCE (INFERENCE) OF NEURAL NETWORKS
US11237894B1 (en)2020-09-292022-02-01Hailo Technologies Ltd.Layer control unit instruction addressing safety mechanism in an artificial neural network processor
US11874900B2 (en)2020-09-292024-01-16Hailo Technologies Ltd.Cluster interlayer safety mechanism in an artificial neural network processor
US11811421B2 (en)2020-09-292023-11-07Hailo Technologies Ltd.Weights safety mechanism in an artificial neural network processor
US11263077B1 (en)2020-09-292022-03-01Hailo Technologies Ltd.Neural network intermediate results safety mechanism in an artificial neural network processor
US11221929B1 (en)2020-09-292022-01-11Hailo Technologies Ltd.Data stream fault detection mechanism in an artificial neural network processor
US12248367B2 (en)2020-09-292025-03-11Hailo Technologies Ltd.Software defined redundant allocation safety mechanism in an artificial neural network processor
WO2022075985A1 (en)*2020-10-072022-04-14Hewlett-Packard Development Company, L.P.Holders for computing components
US20230028666A1 (en)*2021-07-192023-01-26Intel CorporationPerforming global memory atomics in a private cache of a sub-core of a graphics processing unit
US20230390922A1 (en)*2022-06-022023-12-07Northrop Grumman Systems CorporationControlling a robot to remedy a problem

Citations (64)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5063603A (en)1989-11-061991-11-05David Sarnoff Research Center, Inc.Dynamic method for recognizing objects and image processing system therefor
US5136687A (en)1989-10-101992-08-04Edelman Gerald MCategorization automata employing neuronal group selection with reentry
US5142665A (en)*1990-02-201992-08-25International Business Machines CorporationNeural network shell for application programs
US5172253A (en)1990-06-211992-12-15Inernational Business Machines CorporationNeural network model for reaching a goal state
US5388206A (en)1992-11-131995-02-07The University Of North CarolinaArchitecture and apparatus for image generation
US6018696A (en)1996-12-262000-01-25Fujitsu LimitedLearning type position determining device
US20010010034A1 (en)*2000-01-202001-07-26Burton John MarkSimulation of data processing apparatus
US6336051B1 (en)1997-04-162002-01-01Carnegie Mellon UniversityAgricultural harvester with robotic control
US20020046271A1 (en)2000-04-032002-04-18Huang James Ching-LiangSingle switch image for a stack of switches
US20020050518A1 (en)1997-12-082002-05-02Roustaei Alexander R.Sensor array
US20020064314A1 (en)2000-09-082002-05-30Dorin ComaniciuAdaptive resolution system and method for providing efficient low bit rate transmission of image data for distributed applications
US20020168100A1 (en)2001-05-102002-11-14Woodall Roger L.Spatial image processor
US20030026588A1 (en)2001-05-142003-02-06Elder James H.Attentive panoramic visual sensor
US20030078754A1 (en)2001-10-222003-04-24Honeywell International Inc.Multi-sensor information fusion technique
US6647508B2 (en)1997-11-042003-11-11Hewlett-Packard Development Company, L.P.Multiprocessor computer architecture with multiple operating system instances and software controlled resource allocation
US20040015334A1 (en)2002-07-192004-01-22International Business Machines CorporationMethod and apparatus to manage multi-computer demand
EP1224622B1 (en)1999-09-242004-11-10Sun Microsystems, Inc.Method and apparatus for rapid visualization of three-dimensional scenes
US20050166042A1 (en)2002-01-162005-07-28Microsoft CorporationSecure video card methods and systems
US20060129506A1 (en)*2004-07-152006-06-15Neurosciences Research Foundation, Inc.Mobile brain-based device having a simulated nervous system based on the hippocampus
US20060184273A1 (en)2003-03-112006-08-17Tsutomu SawadaRobot device, Behavior control method thereof, and program
US7119810B2 (en)*2003-12-052006-10-10Siemens Medical Solutions Usa, Inc.Graphics processing unit for simulation or medical diagnostic imaging
US20070052713A1 (en)2005-08-092007-03-08Samsung Electronics Co., Ltd.Systems and methods for storing and fetching texture data using bank interleaving
US7219085B2 (en)*2003-12-092007-05-15Microsoft CorporationSystem and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit
US20070198222A1 (en)2006-02-232007-08-23Rockwell Automation Technologies, Inc.System and method to combine and weight multiple sensors with overlapping sensing range to create a measurement system utilized in a high integrity or safety environment
US20070279429A1 (en)2006-06-022007-12-06Leonhard GanzerSystem and method for rendering graphics
US20080033897A1 (en)2006-08-022008-02-07Lloyd Kenneth AObject Oriented System and Method of Graphically Displaying and Analyzing Complex Systems
US20080066065A1 (en)2006-09-072008-03-13Samsung Electronics Co., Ltd.Software robot apparatus
US20080258880A1 (en)2007-01-102008-10-23Smith Richard AInformation Collecting and Decision Making Via Tiered Information Network Systems
US7477256B1 (en)*2004-11-172009-01-13Nvidia CorporationConnecting graphics adapters for scalable performance
US20090080695A1 (en)2007-09-242009-03-26New Span Opto-Technology, Inc.Electro-optical Foveated Imaging and Tracking System
US20090089030A1 (en)2007-09-282009-04-02Rockwell Automation Technologies, Inc.Distributed simulation and synchronization
US7525547B1 (en)*2003-08-122009-04-28Nvidia CorporationProgramming multiple chips from a command buffer to process multiple images
US20090116688A1 (en)2007-11-052009-05-07California Institute Of TechnologySynthetic foveal imaging technology
US20100048242A1 (en)2008-08-192010-02-25Rhoads Geoffrey BMethods and systems for content processing
US7765029B2 (en)2005-09-132010-07-27Neurosciences Research Foundation, Inc.Hybrid control device
US7861060B1 (en)*2005-12-152010-12-28Nvidia CorporationParallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior
US20110004341A1 (en)2009-07-012011-01-06Honda Motor Co., Ltd.Panoramic Attention For Humanoid Robots
US7873650B1 (en)2004-06-112011-01-18Seisint, Inc.System and method for distributing data in a parallel processing system
US20110173015A1 (en)2006-03-032011-07-14Inrix, Inc.Determining road traffic conditions using data from multiple data sources
US20110279682A1 (en)2009-11-122011-11-17Le LiMethods for Target Tracking, Classification and Identification by Using Foveal Sensors
US20120072215A1 (en)2010-09-212012-03-22Microsoft CorporationFull-sequence training of deep structures for speech recognition
US20120089552A1 (en)2008-12-222012-04-12Shih-Fu ChangRapid image annotation via brain state decoding and visual pattern mining
US20120197596A1 (en)2011-01-312012-08-02Raytheon CompanySystem And Method For Distributed Processing
US20120316786A1 (en)2011-06-102012-12-13International Business Machines CorporationRtm seismic imaging using incremental resolution methods
US8392346B2 (en)2008-11-042013-03-05Honda Motor Co., Ltd.Reinforcement learning system
US20130131985A1 (en)2011-04-112013-05-23James D. WeilandWearable electronic image acquisition and enhancement system and method for image acquisition and visual enhancement
US20130126703A1 (en)2007-12-052013-05-23John CaulfieldImaging Detecting with Automated Sensing of an Object or Characteristic of that Object
US8510244B2 (en)2009-03-202013-08-13ISC8 Inc.Apparatus comprising artificial neuronal assembly
US20140019392A1 (en)2012-06-012014-01-16Brain CorporationIntelligent modular robotic apparatus and methods
US20140032461A1 (en)2012-07-252014-01-30Board Of Trustees Of Michigan State UniversitySynapse maintenance in the developmental networks
US8648867B2 (en)*2006-09-252014-02-11Neurala LlcGraphic processor based accelerator system and method
US20140052679A1 (en)2011-09-212014-02-20Oleg SinyavskiyApparatus and methods for implementing event-based updates in spiking neuron networks
US20140089232A1 (en)2012-06-012014-03-27Brain CorporationNeural network learning and collaboration apparatus and methods
WO2014190208A2 (en)2013-05-222014-11-27Neurala, Inc.Methods and apparatus for early sensory integration and robust acquisition of real world knowledge
WO2014204615A2 (en)2013-05-222014-12-24Neurala, Inc.Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence
US20150127149A1 (en)2013-11-012015-05-07Brain CorporationApparatus and methods for online training of robots
US9031692B2 (en)2010-08-242015-05-12Shenzhen Institutes of Advanced Technology Chinese Academy of ScienceCloud robot system and method of integrating the same
US20150134232A1 (en)2011-11-222015-05-14Kurt B. RobinsonSystems and methods involving features of adaptive and/or autonomous traffic control
US20150224648A1 (en)2014-02-132015-08-13GM Global Technology Operations LLCRobotic system with 3d box location functionality
WO2015143173A2 (en)2014-03-192015-09-24Neurala, Inc.Methods and apparatus for autonomous robotic control
WO2016014137A2 (en)2014-05-062016-01-28Neurala, Inc.Apparatuses, methods, and systems for defining hardware-agnostic brains for autonomous robots
US20160075017A1 (en)2014-09-172016-03-17Brain CorporationApparatus and methods for removal of learned behaviors in robots
US20160096270A1 (en)2014-10-022016-04-07Brain CorporationFeature detection apparatus and methods for training of robotic navigation
US9626566B2 (en)2014-03-192017-04-18Neurala, Inc.Methods and apparatus for autonomous robotic control

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2019000208A1 (en)2017-06-272019-01-03张志兰High-efficiency water-circulating freezer

Patent Citations (72)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5136687A (en)1989-10-101992-08-04Edelman Gerald MCategorization automata employing neuronal group selection with reentry
US5063603A (en)1989-11-061991-11-05David Sarnoff Research Center, Inc.Dynamic method for recognizing objects and image processing system therefor
US5142665A (en)*1990-02-201992-08-25International Business Machines CorporationNeural network shell for application programs
US5172253A (en)1990-06-211992-12-15Inernational Business Machines CorporationNeural network model for reaching a goal state
US5388206A (en)1992-11-131995-02-07The University Of North CarolinaArchitecture and apparatus for image generation
US6018696A (en)1996-12-262000-01-25Fujitsu LimitedLearning type position determining device
US6336051B1 (en)1997-04-162002-01-01Carnegie Mellon UniversityAgricultural harvester with robotic control
US6647508B2 (en)1997-11-042003-11-11Hewlett-Packard Development Company, L.P.Multiprocessor computer architecture with multiple operating system instances and software controlled resource allocation
US20020050518A1 (en)1997-12-082002-05-02Roustaei Alexander R.Sensor array
EP1224622B1 (en)1999-09-242004-11-10Sun Microsystems, Inc.Method and apparatus for rapid visualization of three-dimensional scenes
US20010010034A1 (en)*2000-01-202001-07-26Burton John MarkSimulation of data processing apparatus
US20020046271A1 (en)2000-04-032002-04-18Huang James Ching-LiangSingle switch image for a stack of switches
US20020064314A1 (en)2000-09-082002-05-30Dorin ComaniciuAdaptive resolution system and method for providing efficient low bit rate transmission of image data for distributed applications
US20020168100A1 (en)2001-05-102002-11-14Woodall Roger L.Spatial image processor
US20030026588A1 (en)2001-05-142003-02-06Elder James H.Attentive panoramic visual sensor
US20030078754A1 (en)2001-10-222003-04-24Honeywell International Inc.Multi-sensor information fusion technique
US20050166042A1 (en)2002-01-162005-07-28Microsoft CorporationSecure video card methods and systems
US20040015334A1 (en)2002-07-192004-01-22International Business Machines CorporationMethod and apparatus to manage multi-computer demand
US20060184273A1 (en)2003-03-112006-08-17Tsutomu SawadaRobot device, Behavior control method thereof, and program
US7525547B1 (en)*2003-08-122009-04-28Nvidia CorporationProgramming multiple chips from a command buffer to process multiple images
US7119810B2 (en)*2003-12-052006-10-10Siemens Medical Solutions Usa, Inc.Graphics processing unit for simulation or medical diagnostic imaging
US7219085B2 (en)*2003-12-092007-05-15Microsoft CorporationSystem and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit
US7873650B1 (en)2004-06-112011-01-18Seisint, Inc.System and method for distributing data in a parallel processing system
US20060129506A1 (en)*2004-07-152006-06-15Neurosciences Research Foundation, Inc.Mobile brain-based device having a simulated nervous system based on the hippocampus
US7477256B1 (en)*2004-11-172009-01-13Nvidia CorporationConnecting graphics adapters for scalable performance
US20070052713A1 (en)2005-08-092007-03-08Samsung Electronics Co., Ltd.Systems and methods for storing and fetching texture data using bank interleaving
US8583286B2 (en)2005-09-132013-11-12Neurosciences Research Foundation, Inc.Hybrid control device
US7765029B2 (en)2005-09-132010-07-27Neurosciences Research Foundation, Inc.Hybrid control device
US7861060B1 (en)*2005-12-152010-12-28Nvidia CorporationParallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior
US20070198222A1 (en)2006-02-232007-08-23Rockwell Automation Technologies, Inc.System and method to combine and weight multiple sensors with overlapping sensing range to create a measurement system utilized in a high integrity or safety environment
US20110173015A1 (en)2006-03-032011-07-14Inrix, Inc.Determining road traffic conditions using data from multiple data sources
US20070279429A1 (en)2006-06-022007-12-06Leonhard GanzerSystem and method for rendering graphics
US20080033897A1 (en)2006-08-022008-02-07Lloyd Kenneth AObject Oriented System and Method of Graphically Displaying and Analyzing Complex Systems
US20080066065A1 (en)2006-09-072008-03-13Samsung Electronics Co., Ltd.Software robot apparatus
US8648867B2 (en)*2006-09-252014-02-11Neurala LlcGraphic processor based accelerator system and method
US20080258880A1 (en)2007-01-102008-10-23Smith Richard AInformation Collecting and Decision Making Via Tiered Information Network Systems
US20090080695A1 (en)2007-09-242009-03-26New Span Opto-Technology, Inc.Electro-optical Foveated Imaging and Tracking System
US20090089030A1 (en)2007-09-282009-04-02Rockwell Automation Technologies, Inc.Distributed simulation and synchronization
US20090116688A1 (en)2007-11-052009-05-07California Institute Of TechnologySynthetic foveal imaging technology
US20130126703A1 (en)2007-12-052013-05-23John CaulfieldImaging Detecting with Automated Sensing of an Object or Characteristic of that Object
US20100048242A1 (en)2008-08-192010-02-25Rhoads Geoffrey BMethods and systems for content processing
US8392346B2 (en)2008-11-042013-03-05Honda Motor Co., Ltd.Reinforcement learning system
US20120089552A1 (en)2008-12-222012-04-12Shih-Fu ChangRapid image annotation via brain state decoding and visual pattern mining
US8510244B2 (en)2009-03-202013-08-13ISC8 Inc.Apparatus comprising artificial neuronal assembly
US20110004341A1 (en)2009-07-012011-01-06Honda Motor Co., Ltd.Panoramic Attention For Humanoid Robots
US20110279682A1 (en)2009-11-122011-11-17Le LiMethods for Target Tracking, Classification and Identification by Using Foveal Sensors
US9031692B2 (en)2010-08-242015-05-12Shenzhen Institutes of Advanced Technology Chinese Academy of ScienceCloud robot system and method of integrating the same
US20120072215A1 (en)2010-09-212012-03-22Microsoft CorporationFull-sequence training of deep structures for speech recognition
US20120197596A1 (en)2011-01-312012-08-02Raytheon CompanySystem And Method For Distributed Processing
US20130131985A1 (en)2011-04-112013-05-23James D. WeilandWearable electronic image acquisition and enhancement system and method for image acquisition and visual enhancement
US20120316786A1 (en)2011-06-102012-12-13International Business Machines CorporationRtm seismic imaging using incremental resolution methods
US20140052679A1 (en)2011-09-212014-02-20Oleg SinyavskiyApparatus and methods for implementing event-based updates in spiking neuron networks
US20150134232A1 (en)2011-11-222015-05-14Kurt B. RobinsonSystems and methods involving features of adaptive and/or autonomous traffic control
US20140019392A1 (en)2012-06-012014-01-16Brain CorporationIntelligent modular robotic apparatus and methods
US20140089232A1 (en)2012-06-012014-03-27Brain CorporationNeural network learning and collaboration apparatus and methods
US9177246B2 (en)2012-06-012015-11-03Qualcomm Technologies Inc.Intelligent modular robotic apparatus and methods
US20140032461A1 (en)2012-07-252014-01-30Board Of Trustees Of Michigan State UniversitySynapse maintenance in the developmental networks
US20160198000A1 (en)2013-05-222016-07-07Neurala, Inc.Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence
US20160082597A1 (en)2013-05-222016-03-24Neurala, Inc.Methods and apparatus for early sensory integration and robust acquisition of real world knowledge
WO2014190208A2 (en)2013-05-222014-11-27Neurala, Inc.Methods and apparatus for early sensory integration and robust acquisition of real world knowledge
WO2014204615A2 (en)2013-05-222014-12-24Neurala, Inc.Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence
US20150127149A1 (en)2013-11-012015-05-07Brain CorporationApparatus and methods for online training of robots
US20150224648A1 (en)2014-02-132015-08-13GM Global Technology Operations LLCRobotic system with 3d box location functionality
WO2015143173A2 (en)2014-03-192015-09-24Neurala, Inc.Methods and apparatus for autonomous robotic control
US20170024877A1 (en)2014-03-192017-01-26Neurala, Inc.Methods and Apparatus for Autonomous Robotic Control
US9626566B2 (en)2014-03-192017-04-18Neurala, Inc.Methods and apparatus for autonomous robotic control
US20170193298A1 (en)2014-03-192017-07-06Neurala, Inc.Methods and apparatus for autonomous robotic control
US10083523B2 (en)2014-03-192018-09-25Neurala, Inc.Methods and apparatus for autonomous robotic control
WO2016014137A2 (en)2014-05-062016-01-28Neurala, Inc.Apparatuses, methods, and systems for defining hardware-agnostic brains for autonomous robots
US20170076194A1 (en)2014-05-062017-03-16Neurala, Inc.Apparatuses, methods and systems for defining hardware-agnostic brains for autonomous robots
US20160075017A1 (en)2014-09-172016-03-17Brain CorporationApparatus and methods for removal of learned behaviors in robots
US20160096270A1 (en)2014-10-022016-04-07Brain CorporationFeature detection apparatus and methods for training of robotic navigation

Non-Patent Citations (141)

* Cited by examiner, † Cited by third party
Title
Adelson, E. H., Anderson, C. H., Bergen, J. R., Burt, P. J., & Ogden, J. M. (1984). Pyramid methods in image processing. RCA engineer, 29(6), 33-41.
Aggarwal, Charu C, Hinneburg, Alexander, and Keim, Daniel A. On the surprising behavior of distance metrics in high dimensional space. Springer, 2001.
Al-Kaysi, A. M. et al., A Multichannel Deep Belief Network for the Classification of EEG Data, from Ontology-based Information Extraction for Residential Land Use Suitability: A Case Study of the City of Regina, Canada, DOI 10.1007/978-3-319-26561-2_5, 8 pages (Nov. 2015).
Ames, H, Versace, M., Gorchetchnikov, A., Chandler, B., Livitz, G., Léveillé, J., Mingolla, E., Carter, D., Abdalla, H., and Snider, G. (2012) Persuading computers to act more like brains. In Advances in Neuromorphic Memristor Science and Applications, Kozma, R.Pino,R., and Pazienza, G. (eds), Springer Verlag.
Ames, H. Mingolla, E., Sohail, A., Chandler, B., Gorchetchnikov, A., Léveillé, J., Livitz, G. and Versace, M. (2012) The Animat. IEEE Pulse, Feb. 2012, 3(1), 47-50.
Apolloni, B. et al., Training a network of mobile neurons, Proceedings of International Joint Conference on Neural Networks, San Jose, CA, doi: 10.1109/IJCNN.2011.6033427, pp. 1683-1691 (Jul. 31-Aug. 5, 2011).
Artificial Intelligence As a Service. Invited talk, Defrag, Broomfield, CO, Nov. 4-6 (2013).
Aryananda, L. (2006). Attending to learn and learning to attend for a social robot. Humanoids 06, pp. 618-623.
Baraldi, A. and Alpaydin, E. (1998). Simplified ART: A new class of ART algorithms. International Computer Science Institute, Berkeley, CA, TR-98-004, 1998.
Baraldi, A. and Alpaydin, E. (2002). Constructive feedforward ART clustering networks-Part I. IEEE Transactions on Neural Networks 13(3), 645-661.
Baraldi, A. and Alpaydin, E. (2002). Constructive feedforward ART clustering networks—Part I. IEEE Transactions on Neural Networks 13(3), 645-661.
Baraldi, A. and Parmiggiani, F. (1997). Fuzzy combination of Kohonen's and ART neural network models to detect statistical regularities in a random sequence of multi-valued input patterns. In International Conference on Neural Networks, IEEE.
Baraldi, Andrea and Alpaydin, Ethem. Constructive feedforward ART clustering networks-part II. IEEE Transactions on Neural Networks, 13(3):662-677, May 2002. ISSN 1045-9227. doi: 10.1109/tnn.2002.1000131. URL http://dx.doi.org/10.1109/tnn.2002.1000131.
Baraldi, Andrea and Alpaydin, Ethem. Constructive feedforward ART clustering networks—part II. IEEE Transactions on Neural Networks, 13(3):662-677, May 2002. ISSN 1045-9227. doi: 10.1109/tnn.2002.1000131. URL http://dx.doi.org/10.1109/tnn.2002.1000131.
Bengio, Y., Courville, A., & Vincent, P. Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35 Issue 8, Aug. 2013, pp. 1798-1828.
Berenson, D. et al., A robot path planning framework that learns from experience, 2012 International Conference on Robotics and Automation, 2012, 9 pages [retrieved from the internet] URL:http://users.wpi.edu/-dberenson/lightning.pdf.
Bernhard, F., and Keriven, R. (2005). Spiking Neurons on GPUs. Tech. Rep. 05-15, Ecole Nationale des Ponts et Chauss'es, 8 pages.
Besl, P. J., & Jain, R. C. (1985). Three-dimensional object recognition. ACM Computing Surveys (CSUR), 17(1), 75-145.
Boddapati, V., Classifying Environmental Sounds with Image Networks, Thesis, Faculty of Computing Blekinge Institute of Technology, 37 pages. (Feb. 2017).
Bohn, C.-A. Kohonen. (1998). Feature Mapping Through Graphics Hardware. In Proceedings of 3rd Int. Conference on Computational Intelligence and Neurosciences, 4 pages.
Bradski, G., & Grossberg, S. (1995). Fast-learning Viewnet architectures for recognizing three-dimensional objects from multiple two-dimensional views. Neural Networks, 8 (7-8), 1053-1080.
Canny, J.A. (1986). Computational Approach to Edge Detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 8(6):679-698.
Carpenter, G.A. and Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing 37, 54-115.
Carpenter, G.A., and Grossberg, S. (1995). Adaptive resonance theory (ART). In M. Arbib (Ed.), The handbook of brain theory and neural networks. (pp. 79-82). Cambridge, M.A.: MIT press.
Carpenter, G.A., Grossberg, S. and Rosen, D.B. (1991). Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks 4, 759-771.
Carpenter, Gail A and Grossberg, Stephen. The art of adaptive pattern recognition by a self-organizing neural network. Computer, 21(3):77-88, 1988.
Coifman, R.R. and Maggioni, M. Diffusion wavelets. Applied and Computational Harmonic Analysis, 21(1):53-94, 2006.
Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Nadler, B., Warner, F., and Zucker, S.W. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7426, 2005.
Cornwall et al., "Automatically Translating a General Purpose C++ Image Processing Library for GPUs", IEEE, Jun. 2006, 8 pages. (Year: 2006).*
Davis, C. E. 2005. Graphic Processing Unit Computation of Neural Networks. Master's thesis, University of New Mexico, Albuquerque, NM, 121 pages.
Dosher, B.A., and Lu, Z.L. (2010). Mechanisms of perceptual attention in precuing of location. Vision Res., 40(10-12). 1269-1292.
Ellias, S. A., and Grossberg, S. (1975). Pattern formation, contrast control and oscillations in the short term memory of shunting on-center off-surround networks. Biol Cybern 20, pp. 69-98.
Extended European Search Report and Written Opinion dated Jun. 1, 2017 from European Application No. 14813864.7, 10 pages.
Extended European Search Report and Written Opinion dated Oct. 12, 2017 from European Application No. 14800348.6, 12 pages.
Extended European Search Report and Written Opinion Oct. 23, 2017 from European Application No. 15765396.5, 8 pages.
Fazl, A., Grossberg, S., and Mingolla, E. (2009). View-invariant object category learning, recognition, and search: How spatial and object attention are coordinated using surface-based attentional shrouds. Cognitive Psychology 58, 1-48.
Földiák, P. (1990). Forming sparse representations by local anti-Hebbian learning, Biological Cybernetics, vol. 64, pp. 165-170.
Friston K., Adams R., Perrinet L., & Breakspear M. (2012). Perceptions as hypotheses: saccades as experiments. Frontiers in Psychology, 3 (151), 1-20.
Galbraith, B.V, Guenther, F.H., and Versace, M. (2015) A neural network-based exploratory learning and motor planning system for co-robots.Frontiers in Neuroscience, in press.
George, D. and Hawkins, J. (2009). Towards a mathematical theory of cortical micro-circuits. PLoS Computational Biology 5(10), 1-26.
Georgeii, J., and Westermann, R. (2005). Mass-spring systems on the GPU. Simulation Modelling Practice and Theory 13, pp. 693-702.
Gorchetchnikov A. An Approach to a Biologically Realistic Simulation of Natural Memory. Master's thesis, Middle Tennessee State University, Murfreesboro, TN, 70 pages.
Gorchetchnikov A., Hasselmo M.E. (2005). A biophysical implementation of a bidirectional graph search algorithm to solve multiple goal navigation tasks. Connection Science, 17(1-2), pp. 145-166.
Gorchetchnikov A., Hasselmo M.E. (2005). A simple rule for spike-timing-dependent plasticity: local influence of AHP current. Neurocomputing, 65-66, pp. 885-890.
Gorchetchnikov A., Versace, M., Hasselmo M.E. (2005). A Model of STDP Based on Spatially and Temporally Local Information: Derivation and Combination with Gated Decay. Neural Networks, 18, pp. 458-466.
Gorchetchnikov A., Versace, M., Hasselmo M.E. (2005). Spatially and temporally local spiketiming-dependent plasticity rule. In: Proceedings of the International Joint Conference on Neural Networks, No. 1568 in IEEE CD-ROM Catalog No. 05CH37662C, pp. 390-396.
Grossberg, S. (1973). Contour enhancement, short-term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics 52, 213-257.
Grossberg, S., and Huang, T.R. (2009). Artscene: A neural system for natural scene classification. Journal of Vision, 9 (4), 6.1-19. doi:10.1167/9.4.6.
Grossberg, S., and Versace, M. (2008) Spikes, synchrony, and attentive learning by laminar thalamocortical circuits. Brain Research, 1218C, 278-312 [Authors listed alphabetically].
Hagen, T.R., Hjelmervik, J., Lie, K.-A., Natvig, J. and Ofstad Henriksen, M. (2005). Visual simulation of shallow-water waves. Simulation Modelling Practice and Theory 13, pp. 716-726.
Hasselt, Hado Van. Double q-learning. In Advances in Neural Information Processing Systems, pp. 2613-2621, 2010.
Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527-1554.
Hodgkin, A.L., and Huxley, A.F. (1952). Quantitative description of membrane current and its application to conduction and excitation in nerve. J Physiol 117, pp. 500-544.
Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. In Proc Natl Acad Sci USA, vol. 79, pp. 2554-2558.
Ilie, A. (2002). Optical character recognition on graphics hardware. Tech. Rep. integrative paper, UNCCH, Department of Computer Science, 9 pages.
International Preliminary Report on Patentability dated Nov. 8, 2016 from International Application No. PCT/US2015/029438, 7 pages.
International Preliminary Report on Patentability in related PCT Application No. PCT/US2014/039162 filed May 22, 2014, dated Nov. 24, 2015, 7 pages.
International Preliminary Report on Patentability in related PCT Application No. PCT/US2014/039239 filed May 22, 2014, dated Nov. 24, 2015, 8 pages.
International Search Report and Written Opinion dated Feb. 18, 2015 from International Application No. PCT/US2014/039162, 12 pages.
International Search Report and Written Opinion dated Feb. 23, 2016 from International Application No. PCT/US2015/029438, 11 pages.
International Search Report and Written Opinion dated Jul. 6, 2017 from International Application No. PCT/US2017/029866, 12 pages.
International Search Report and Written Opinion dated Nov. 26, 2014 from International Application No. PCT/US2014/039239, 14 pages.
International Search Report and Written Opinion dated Sep. 15, 2015 from International Application No. PCT/US2015/021492, 9 pages.
Itti, L., and Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2 (3), 194-203.
Itti, L., Koch, C., and Niebur, E. (1998). A Model of Saliency-Based Visual Attention for Rapid Scene Analysis, 1-6.
Jarrett, K., Kavukcuoglu, K., Ranzato, M. A., & LeCun, Y. (Sep. 2009). What is the best multi-stage architecture for object recognition?. In Computer Vision, 2009 IEEE 12th International Conference on (pp. 2146-2153). IEEE.
Khaligh-Razavi, S.-M. et al., Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation, PLoS Computational Biology, vol. 10, Issue 11, 29 pages. (Nov. 2014).
Kim, S., Novel approaches to clustering, biclustering and algorithms based on adaptive resonance theory and intelligent control, Doctoral Dissertations, Missouri University of Science and Technology, 125 pages. (2016).
Kipfer, P., Segal, M., and Westermann, R. (2004). UberFlow: A GPU-Based Particle Engine. In Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware 2004, pp. 115-122.
Kolb, A., L. Latta, and C. Rezk-Salama. (2004). "Hardware-Based Simulation and Collision Detection for Large Particle Systems." in Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware 2004, pp. 123-131.
Kompella, Varun Raj, Luciw, Matthew, and Schmidhuber, Jürgen. Incremental slow feature analysis: Adaptive low-complexity slow feature updating from high-dimensional input streams. Neural Computation, 24(11):2994-3024, 2012.
Kowler, E. (2011). Eye movements: The past 25years. Vision Research, 51(13), 1457-1483. doi:10.1016/j.visres.2010.12.014.
Larochelle H., & Hinton G. (2012). Learning to combine foveal glimpses with a third-order Boltzmann machine. NIPS 2010,1243-1251.
LeCun, Y., Kavukcuoglu, K., & Farabet, C. (May 2010). Convolutional networks and applications in vision. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on (pp. 253-256). IEEE.
Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788-791.
Lee, D. D., and Seung, H. S. (1997). "Unsupervised learning by convex and conic coding." Advances in Neural Information Processing Systems, 9.
Legenstein, R., Wilbert, N., and Wiskott, L. Reinforcement learning on slow features of high-dimensional input streams. PLoS Computational Biology, 6(8), 2010. ISSN 1553-734X.
Léveillé, J., Ames, H., Chandler, B., Gorchetchnikov, A., Mingolla, E., Patrick, S., and Versace, M. (2010) Learning in a distributed software architecture for large-scale neural modeling. BIONETICS10, Boston, MA, USA.
Livitz G., Versace M., Gorchetchnikov A., Vasilkoski Z., Ames H., Chandler B., Leveille J. and Mingolla E. (2011) Adaptive, brain-like systems give robots complex behaviors, The Neuromorphic Engineer, : 10.2417/1201101.003500 Feb. 2011. 3 pages.
Livitz, G., Versace, M., Gorchetchnikov, A., Vasilkoski, Z., Ames, H., Chandler, B., Léveillé, J., Mingolla, E., Snider, G., Amerson, R., Carter, D., Abdalla, H., and Qureshi, S. (2011) Visually-Guided Adaptive Robot (ViGuAR). Proceedings of the International Joint Conference on Neural Networks (IJCNN) 2011, San Jose, CA, USA.
Lowe, D.G.(2004). Distinctive Image Features from Scale-Invariant Keypoints. Journal International Journal of Computer Vision archive vol. 60, 2, 91-110.
Lu, Z.L., Liu, J., and Dosher, B.A. (2010) Modeling mechanisms of perceptual learning with augmented Hebbian re-weighting. Vision Research, 50(4). 375-390.
Luo et al., "Artificial Neural Network Computation on Graphic Process Unit", IEEE, "Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, Jul. 31-Aug. 4, 2005", Feb. 2005, pp. 622-626 (Year: 2005).*
Mahadevan, S. Proto-value functions: Developmental reinforcement learning. In Proceedings of the 22nd international conference on Machine learning, pp. 553-560. ACM, 2005.
Meuth, J.R. and Wunsch, D.C. (2007) A Survey of Neural Computation on Graphics Processing Hardware. 22nd IEEE International Symposium on Intelligent Control, Part of IEEE Multi-conference on Systems and Control, Singapore, Oct. 1-3, 2007, 5 pages.
Mishkin M, Ungerleider LG. (1982). "Contribution of striate inputs to the visuospatial functions of parieto-preoccipital cortex in monkeys," Behav Brain Res, 6 (1): 57-77.
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, Feb. 25, 2015.
Montrym et al., "The GeForce 6800", IEEE, 2005, 11 pages. (Year: 2005).*
Moore, Andrew W and Atkeson, Christopher G. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13(1):103-130, 1993.
Najemnik, J., and Geisler, W. (2009). Simple summation rule for optimal fixation selection in visual search. Vision Research. 49, 1286-1294.
Non-Final Office Action dated Jan. 4, 2018 from U.S. Appl. No. 15/262,637, 23 pages.
Notice of Alllowance dated May 22, 2018 from U.S. Appl. No. 15/262,637, 6 pages.
Notice of Allowance dated Dec. 16, 2016 from U.S. Appl. No. 14/662,657.
Notice of Allowance dated Jul. 27, 2016 from U.S. Appl. No. 14/662,657.
Oh, K.-S., and Jung, K. (2004). GPU implementation of neural networks. Pattern Recognition 37, pp. 1311-1314.
Oja, E. (1982). Simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 15(3), 267-273.
Partial Supplementary European Search Report dated Jul. 4, 2017 from European Application No. 14800348.6, 13 pages.
Perumalla, Kalyan S., "Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)", IEEE, "Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation (PADS'06)", Mar. 2006, 8 pages. (Year: 2006).*
Raijmakers, M.E.J., and Molenaar, P. (1997). Exact ART: A complete implementation of an ART network Neural networks 10 (4), 649-669.
Ranzato, M. A., Huang, F. J., Boureau, Y. L., & Lecun, Y. (Jun. 2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on (pp. 1-8). IEEE.
Raudies, F., Eldridge, S., Joshi, A., and Versace, M. (Aug. 20, 2014). Learning to navigate in a virtual world using optic flow and stereo disparity signals. Artificial Life and Robotics, DOI 10.1007/s10015-014-0153-1.
Ren, Y. et al., Ensemble Classification and Regression-Recent Developments, Applications and Future Directions, in IEEE Computational Intelligence Magazine, 10.1109/MCI.2015.2471235, 14 pages (2016).
Ren, Y. et al., Ensemble Classification and Regression—Recent Developments, Applications and Future Directions, in IEEE Computational Intelligence Magazine, 10.1109/MCI.2015.2471235, 14 pages (2016).
Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2 (11), 1019-1025.
Riesenhuber, M., & Poggio, T. (2000). Models of object recognition. Nature neuroscience, 3, 1199-1204.
Rolfes, T. (2004). Artificial Neural Networks on Programmable Graphics Hardware. In Game Programming Gems 4, A. Kirmse, Ed. Charles River Media, Hingham, MA, pp. 373-378.
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In IEEE International Conference on Computer Vision (ICCV) 2011, 2564-2571.
Ruesch, J. et al. (2008). Multimodal Saliency-Based Bottom-Up Attention a Framework for the Humanoid Robot iCub. 2008 IEEE International Conference on Robotics and Automation, pp. 962-965.
Rumelhart D., Hinton G., and Williams, R. (1986). Learning internal representations by error propagation. In Parallel distributed processing: explorations in the microstructure of cognition, vol. 1, MIT Press.
Rumpf, M. and Strzodka, R. Graphics processor units: New prospects for parallel computing. In Are Magnus Bruaset and Aslak Tveito, editors, Numerical Solution of Partial Differential Equations on Parallel Computers, vol. 51 of Lecture Notes in Computational Science and Engineering, pp. 89-134. Springer, 2005.
Salakhutdinov, R., & Hinton, G. E. (2009). Deep boltzmann machines. In International Conference on Artificial Intelligence and Statistics (pp. 448-455).
Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. arXiv preprint arXiv: 1511.05952, Nov. 18, 2015.
Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990-2010). Autonomous Mental Development, IEEE Transactions on, 2(3), 230-247.
Schmidhuber, Jürgen. Curious model-building control systems. In Neural Networks, 1991. 1991 IEEE International Joint Conference on, pp. 1458-1463. IEEE, 1991.
Seibert, M., & Waxman, A.M. (1992). Adaptive 3-D Object Recognition from Multiple Views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (2), 107-124.
Setoain et al., "Parallel Hyperspectral Image Processing on Commodity Graphics Hardware", IEEE, "Proceedings of the 2006 International Conference on Parallel Processing Workshops (ICPPW'06)", Mar. 2006, 8 pages. (Year: 2006).*
Sherbakov, L. and Versace, M. (2014) Computational principles for an autonomous active vision system. Ph.D., Boston University, http://search.proquest.com/docview/1558856407.
Sherbakov, L. et al. 2012. CogEye: from active vision to context identification, youtube, retrieved from the Internet on Oct. 10, 2017: URL://www.youtube.com/watch?v=i5PQk962B1k, 1 page.
Sherbakov, L. et al. 2013. CogEye: system diagram module brain area function algorithm approx # neurons, retrieved from the Internet on Oct. 12, 2017: URL://http://www-labsticc.univ-ubs.fr/˜coussy/neucomp2013/index_fichiers/material/posters/NeuComp2013_final56x36.pdf, 1 page.
Sherbakov, L., Livitz, G., Sohail, A., Gorchetchnikov, A., Mingolla, E., Ames, H., and Versace, M (2013b) A computational model of the role of eye-movements in object disambiguation. Cosyne, Feb. 28-Mar. 3, 2013. Salt Lake City, UT, USA.
Sherbakov, L., Livitz, G., Sohail, A., Gorchetchnikov, A., Mingolla, E., Ames, H., and Versace, M. (2013a) CogEye: An online active vision system that disambiguates and recognizes objects. NeuComp 2013.
Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D. E.
Snider, Greg, et al. "From synapses to circuitry: Using memristive memory to explore the electronic brain." IEEE computer, vol. 44(2). (2011): 21-28.
Spratling, M. W. (2008). Predictive coding as a model of biased competition in visual attention. Vision Research, 48(12):1391-1408.
Spratling, M. W. (2012). Unsupervised learning of generative and discriminative weights encoding elementary image components in a predictive coding model of cortical function. Neural Computation, 24(1):60-103.
Spratling, M. W., De Meyer, K., and Kompass, R. (2009). Unsupervised learning of overlapping image components using divisive input modulation. Computational intelligence and neuroscience.
Sprekeler, H. On the relation of slow feature analysis and laplacian eigenmaps. Neural Computation, pp. 1-16, 2011.
Sun, Z. et al., Recognition of SAR target based on multilayer auto-encoder and SNN, International Journal of Innovative Computing, Information and Control, vol. 9, No. 11, pp. 4331-4341, Nov. 2013.
Sutton, Richard S and Barto, Andrew G. Reinforcement learning: An introduction. MIT press, 1998.
Tong, F., Ze-Nian Li, (1995). Reciprocal-wedge transform for space-variant sensing, Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol. 17, No. 5, pp. 500-51. doi: 10.1109/34.391393.
Torralba, A., Oliva, A., Castelhano, M.S., Henderson, J.M. (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychological Review, 113(4).766-786.
Van Hasselt, Hado, Guez, Arthur, and Silver, David. Deep reinforcement learning with double q-learning. arXiv preprint arXiv: 1509.06461, Sep. 22, 2015.
Versace, Brain-inspired computing. Invited keynote address, Bionetics 2010, Boston, MA, USA. 1 page.
Versace, M. (2006) From spikes to interareal synchrony: how attentive matching and resonance control learning and information processing by laminar thalamocortical circuits. NSF Science of Learning Centers PI Meeting, Washington, DC, USA.
Versace, M., (2010) Open-source software for computational neuroscience: Bridging the gap between models and behavior. In Horizons in Computer Science Research,vol. 3.
Versace, M., Ames, H., Léveillé, J., Fortenberry, B., and Gorchetchnikov, A. (2008) KlnNeSS: A modular framework for computational neuroscience. Neuroinformatics, 2008 Winter; 6(4):291-309. Epub Aug 10, 2008.
Versace, M., and Chandler, B. (2010) MoNETA: A Mind Made from Memristors. IEEE Spectrum, Dec. 2010.
Versace, TEDx Fulbright, Invited talk, Washington DC, Apr. 5, 2014. 30 pages.
Webster, Bachevalier, Ungerleider (1994). Connections of IT areas TEO and TE with parietal and frontal cortex in macaque monkeys. Cerebal Cortex, 4(5), 470-483.
Wiskott, Laurenz and Sejnowski, Terrence. Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4):715-770, 2002.
Wu, Yan & J. Cai, H. (2010). A Simulation Study of Deep Belief Network Combined with the Self-Organizing Mechanism of Adaptive Resonance Theory. 10.1109/CISE.2010.5677265, 4 pages.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
USRE49461E1 (en)*2006-09-252023-03-14Neurala, Inc.Graphic processor based accelerator system and method

Also Published As

Publication numberPublication date
US20080117220A1 (en)2008-05-22
US9189828B2 (en)2015-11-17
USRE49461E1 (en)2023-03-14
US8648867B2 (en)2014-02-11
US20140192073A1 (en)2014-07-10

Similar Documents

PublicationPublication DateTitle
USRE49461E1 (en)Graphic processor based accelerator system and method
EP2939208B1 (en)Sprite graphics rendering system
US11526964B2 (en)Deep learning based selection of samples for adaptive supersampling
BlytheRise of the graphics processor
JP5345226B2 (en) Graphics processor parallel array architecture
US7907143B2 (en)Interactive debugging and monitoring of shader programs executing on a graphics processor
TWI498819B (en)System and method for performing shaped memory access operations
US7058945B2 (en)Information processing method and recording medium therefor capable of enhancing the executing speed of a parallel processing computing device
US11837195B2 (en)Apparatus and method for command stream optimization and enhancement
US20110238955A1 (en)Methods for scalably exploiting parallelism in a parallel processing system
JP2020537785A (en) Multi-layer neural network processing with a neural network accelerator using merged weights and a package of layered instructions to be hosted
CN113610697A (en)Scalable sparse matrix multiplication acceleration using systolic arrays with feedback inputs
CN101802789A (en)Parallel runtime execution on multiple processors
US7750915B1 (en)Concurrent access of data elements stored across multiple banks in a shared memory resource
CN113448759A (en)High speed recovery of GPU applications
GB2489526A (en)Representing and calculating with sparse matrixes in simulating incompressible fluid flows.
US9513923B2 (en)System and method for context migration across CPU threads
Fatahalian et al.GPUs: A Closer Look: As the line between GPUs and CPUs begins to blur, it’s important to understand what makes GPUs tick.
JP2023004864A (en)Use of sparsity metadata for reducing systolic array power consumption
CN119151767A (en)Dynamic accumulator allocation
KR102209526B1 (en)Method and apparatus for analysis protein-ligand interaction using parallel operation
US7475001B2 (en)Software package definition for PPU enabled system
US8219371B1 (en)Computing inter-atom forces between atoms in a protein
US9465666B2 (en)Game engine and method for providing an extension of the VSIPL++ API
US9542192B1 (en)Tokenized streams for concurrent execution between asymmetric multiprocessors

Legal Events

DateCodeTitleDescription
FEPPFee payment procedure

Free format text:ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

ASAssignment

Owner name:NEURALA LLC, MASSACHUSETTS

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GORCHETCHNIKOV, ANATOLI;AMES, HEATHER MARIE;VERSACE, MASSIMILIANO;AND OTHERS;REEL/FRAME:044087/0788

Effective date:20071012

Owner name:NEURALA, INC., MASSACHUSETTS

Free format text:ENTITY CONVERSION;ASSIGNOR:NEURALA LLC;REEL/FRAME:044818/0955

Effective date:20130221

FEPPFee payment procedure

Free format text:ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 8TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2552); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment:8

ASAssignment

Owner name:NEURAL AI, LLC, TEXAS

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NEURALA, INC.;REEL/FRAME:068567/0282

Effective date:20240722

IPRAia trial proceeding filed before the patent and appeal board: inter partes review

Free format text:TRIAL NO: IPR2025-00610

Opponent name:NVIDIA CORPORATION

Effective date:20250430

Free format text:TRIAL NO: IPR2025-00609

Opponent name:NVIDIA CORPORATION

Effective date:20250430


[8]ページ先頭

©2009-2025 Movatter.jp