Movatterモバイル変換


[0]ホーム

URL:


CN105510961A - System and method for processing prestack reverse-time offset data - Google Patents

System and method for processing prestack reverse-time offset data
Download PDF

Info

Publication number
CN105510961A
CN105510961ACN201410513746.2ACN201410513746ACN105510961ACN 105510961 ACN105510961 ACN 105510961ACN 201410513746 ACN201410513746 ACN 201410513746ACN 105510961 ACN105510961 ACN 105510961A
Authority
CN
China
Prior art keywords
data
result
processing
overlapped
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410513746.2A
Other languages
Chinese (zh)
Inventor
张慧宇
孔祥宁
李博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Petroleum and Chemical Corp
Sinopec Geophysical Research Institute
Original Assignee
China Petroleum and Chemical Corp
Sinopec Geophysical Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Petroleum and Chemical Corp, Sinopec Geophysical Research InstitutefiledCriticalChina Petroleum and Chemical Corp
Priority to CN201410513746.2ApriorityCriticalpatent/CN105510961A/en
Publication of CN105510961ApublicationCriticalpatent/CN105510961A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

The invention discloses a system for processing prestack reverse-time offset data. The system comprises a plurality of calculating modules, each of which is used for processing a part of the prestack reverse-time offset data; and a data relay module, which is connected with each of the calculating modules, and is used to realize the data interaction among the calculating modules. The method for processing the prestack reverse-time offset data is characterized in that the prestack reverse-time offset data to be processed can be acquired; the data can be divided into a plurality of data units; all of the data units can be processed parallely at the same time by taking each of the data units as the processing object, and therefore the final processing result can be acquired. Compared to the prior art, by adopting the processing method and the processing system for the data processing, the limitation of the GPU video memory on the data volume and the calculated quantity can be prevented, and the data processing efficiency can be greatly improved, and the processing time of the single GPU processing data can be further shortened.

Description

A kind of system and method processing pre-Stack Reverse data
Technical field
The present invention relates to geological exploration field, relate to a kind of system and method processing pre-Stack Reverse data specifically.
Background technology
Pre-Stack Reverse is a computation-intensive and data-intensive seismic migration imaging method.Graphic process unit (GraphicProcessingUnit is called for short GPU) technology fast-developing in recent years, for pre-stack depth migration process provides strong instrument, substantially increases the treatment effeciency of pre-Stack Reverse data.But the video memory space constraint (being generally 4-8GB) that GPU itself has constrains data volume and the calculated amount of the manageable pre-Stack Reverse data of GPU.
In order to process the pre-Stack Reverse data of more big data quantity and calculated amount, and improving the treatment effeciency of pre-Stack Reverse data further, needing a kind of system and method for process pre-Stack Reverse data newly.
Summary of the invention
For existing disposal route Problems existing, the invention provides a kind of method processing pre-Stack Reverse data, described method comprises following steps:
Step one, obtains pending pre-Stack Reverse data;
Described pre-Stack Reverse Data Placement is multiple data cell by step 2;
Step 3, respectively with each described data cell for handling object carries out parallel processing to all described data cells simultaneously, thus obtain the result of each described data cell, and then obtain the result of described pre-Stack Reverse data.
In one embodiment, in described step 2, dynamically determine the number of described data cell according to computation requirement.
In one embodiment, described step 3 comprises following steps:
Data Placement step, is divided into independent data and overlapped data by described data cell;
Independent data treatment step, processes described independent data thus obtains independent data result;
Overlapped data treatment step, processes described overlapped data thus obtains overlapped data result.
In one embodiment, described overlapped data treatment step comprises following steps:
First overlap processing step, processes described overlapped data thus obtains the first overlap processing result;
Exchanges data step, carries out exchanges data to the described first overlap processing result of the correspondence of different pieces of information unit thus obtains the second overlap processing result.
In one embodiment, in described step 3, while carrying out described independent data treatment step, carry out described exchanges data step.
In one embodiment, described overlapped data treatment step also comprises the second overlap processing step, processes thus obtain described overlapped data result to described second overlap processing result.
In one embodiment, described second overlap processing step is performed when described independent data treatment step is finished.
In one embodiment, it is characterized in that, described step 3 also comprises acquisition data cell processing result step, obtains the result of described data cell based on described independent data result and described overlapped data result.
The invention allows for a kind of system processing pre-Stack Reverse data, described system comprises:
Multiple computing module, each described computing module is for the treatment of a data cell of pending pre-Stack Reverse data;
Data relay module, it is connected with each described computing module, for realizing the data interaction between described computing module.
In one embodiment, each described computing module adopts a block graphics processor to build.
Compared with prior art, utilize disposal route of the present invention and system processes data, not only avoid GPU video memory to data volume and calculating quantitative limitation, also substantially increase data-handling efficiency.And further shorten the processing time of GPU process data.
Further feature of the present invention or advantage will be set forth in the following description.Further, Partial Feature of the present invention or advantage will be become apparent by instructions, or be understood by implementing the present invention.Object of the present invention and certain advantages realize by step specifically noted in instructions, claims and accompanying drawing or obtain.
Accompanying drawing explanation
Accompanying drawing is used to provide a further understanding of the present invention, and forms a part for instructions, with embodiments of the invention jointly for explaining the present invention, is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is according to one embodiment of the invention flowchart;
Fig. 2 is that data cell divides schematic diagram according to an embodiment of the invention;
Fig. 3 is the time step schematic diagram of prior art process data;
Fig. 4 is the time step schematic diagram according to one embodiment of the invention process data;
Fig. 5 is part flowchart according to an embodiment of the invention.
Embodiment
Embodiments of the present invention are described in detail below with reference to drawings and Examples, enforcement personnel of the present invention whereby can fully understand how application technology means solve technical matters in the present invention, and reach the implementation procedure of technique effect and specifically implement the present invention according to above-mentioned implementation procedure.It should be noted that, only otherwise form conflict, each embodiment in the present invention and each feature in each embodiment can be combined with each other, and the technical scheme formed is all within protection scope of the present invention.
In the prior art, when the data volume of pre-Stack Reverse data is larger, because GPU video memory is not enough, therefore GPU cannot meet data volume and computational requirements.In order to solve this difficult problem, the present invention proposes a kind of method and system processing pre-Stack Reverse data.The present invention utilizes the parallel computation framework of GPU (ComputingUnifiedDeviceArchitecture is called for short CUDA) to adopt many cards combination process pattern to solve the problem of single GPU video memory off-capacity.Next the implementation of method of the present invention is described in detail based on process flow diagram.
Step shown in the process flow diagram of accompanying drawing can perform in the computer system comprising such as one group of computer executable instructions.Although show the logical order of each step in flow charts, in some cases, can be different from the step shown or described by order execution herein.
As shown in Figure 1, first perform step S110, obtain pending pre-Stack Reverse data.Then performing step S120, is multiple data cell by pending pre-Stack Reverse Data Placement.Like this relative to pre-Stack Reverse data generally speaking, data volume and the calculated amount of an independent data cell reduce greatly.In the present embodiment, the number of data cell is dynamically determined according to current computation requirement.For the data cell after division, disposal system of the present invention is configured with multiple computing module, and each computing module is for the treatment of a data cell.
In the present embodiment, each described computing module adopts one piece of GPU card to build.Adopt polylith GPU, every block GPU process data cell, based on the CUDA framework of GPU, all GPU can parallel execution of data process behavior on the whole, and namely the process behavior of all data cells can be carried out simultaneously.Not only solve single GPU like this and cannot meet the data volume of pre-Stack Reverse data entirety and the problem of calculated amount, and substantially increase counting yield.
In the present embodiment, pre-Stack Reverse Data Placement is become n data cell, every block GPU card process data cell.Next just step S130 can be performed, parallel data processing.Respectively with each data cell for handling object carries out parallel processing to all data cells simultaneously, thus obtain the result of each data cell, and then obtain the result of pre-Stack Reverse data.
Certain, utilize GPU carry out data processing before to carry out these initializations such as setting of initialization, equipment choice, output transmission, thread grid to GPU equipment, start many card GPU parallel processing afterwards.Just do not add to repeat to above-mentioned preparation process in this manual.
Compared to the tupe utilizing a GPU card to process whole pre-Stack Reverse data, under multi-card paralleled tupe, corresponding reverse-time migration calculates also more complicated.Pre-Stack Reverse data are made up of numerous big gun data volume, each calculating big gun data volume.Utilize many cards combined calculation big gun data volume, mean and a big gun data volume is calculated with different GPU calorimeters respectively.Namely a big gun data volume is needed to be divided into n data cell (card 1, card 2 according to mode as shown in Figure 2 ... card n).Due to the data content character of big gun data volume itself, when dividing data unit, can there is overlapping region in adjacent two data unit, as shown in Figure 2.This just causes under parallel processing mode, not only needs to carry out data transmission between GPU and main frame in computation process, also needs to carry out exchanges data between GPU card.Owing to directly can not carry out data transmission between GPU card, therefore system of the present invention is also configured with data relay module, and it is connected with each computing module (GPU card), for carrying out the data interaction between two computing modules.
In the present embodiment, adopt main frame as data relay module.Exchanges data is carried out by this bridge of main frame between GPU card.In order to meet the process of extensive pre-Stack Reverse data arbitrarily, system of the present invention employs the GPU card comprising direct memory access (DirectMemoryAccess is called for short DMA) or remote direct data access (RemoteDirectMemoryAccess is called for short RDMA) technology.DMA technology can directly read and write CUDA host memory, eliminate unnecessary Installed System Memory copy and central processing unit (CPU) expense, also between support GPU and the DMA of the GPU to the P2P of the direct access of other GPU internal memories of similar NUMA structure directly transmit.
The present invention also utilizes CPU internal memory as the data exchange mechanism of terminal based on multipoint interface (MultiPointInterface is called for short MPI) technical design, and this mechanism does not limit the quantity of the GPU card required for each data processing.In the present embodiment, the quantity of the GPU card of needs is planned dynamically according to the computation requirement of current data process.The simplest mode is the quantity namely being obtained required GPU card by the data scale when last big gun data divided by the video memory capacity that GPU mono-piece blocks.
Due to the existence of data exchange process, although the restriction of GPU video memory deficiency can be solved by the many card process of GPU, have also been introduced the problem that traffic increases but then.In disposal system in the present embodiment, the exchanges data between each GPU card, just needs the data transmission between 4 GPU and CPU, and the data transmission between a CPU internal memory.The existence of data exchange process considerably increases the input and output traffic of entire system.Be understood that, the data due to CPU and GPU export has certain access delay, and add data under normal circumstances and calculate with data that to transmit be that serial performs, therefore namely the increase of traffic means the prolongation in final processing time.
Define a secondary data to calculate and processing time needed for the transmission of corresponding data is a time step.Fig. 3 is the Synchronous data dispose schematic flow sheet of two pieces of GPU cards execution, two time steps.As shown in Figure 3, time of consuming of 311,321, the 313 and 323 rectangular section representative data computation processes marked; 312, time of consuming of 322, the 314 and 324 rectangular section representative data exchange process marked.Because GPU serial performs data calculation process and data exchange process, and the process of GPU1 and GPU2 parallel execution of data.Therefore namely time step 1 is equivalent to the execution time sum representated by 311 and 312 (or 321 and 322), and namely time step 2 is equivalent to the execution time sum representated by 313 and 314 (or 323 and 324).
For shortening each time step, in the present embodiment, by data Further Division to be processed needed for every sheet GPU.In same a slice GPU, its data handling procedure is as shown in the idiographic flow in Fig. 1 dotted line frame A.First step S131 is performed, Data Placement step.The required data to be processed of every block GPU card can be divided into the independent data that can independently calculate and the overlapped data needing to carry out exchanges data.Carry out respectively processing for independent data or overlapped data thus obtain independent data result or overlapped data result.
After step S131 completes, for independent data, perform step S135, independent data treatment step, its all calculation processes can complete in a slice GPU.For overlapped data, need first to carry out step S132, the first overlap processing step, based on overlapping calculation data acquisition first overlap processing result.Then perform step S133, exchanges data step, exchanges data is carried out to the first overlap processing result of the correspondence of different pieces of information unit thus obtains the second overlap processing result.Finally perform step S134, the second overlapped data treatment step, obtain overlapped data result based on the second overlap processing result.
The result of the overlapped data obtained based on step S134 and step S135 and independent data performs step S136, obtain data cell processing result step, obtain the result of data cell based on independent data result and overlapped data result.
Based on above-mentioned data handling procedure, in order to optimize the data communication between GPU card, reduce overall data processing time, data processing method of the present invention adopts exchanges data hiding strategy.In the present embodiment, GPU is utilized to flow process (Stream) technology to realize hiding of exchanges data.GPU can create multiple stream (Stream) in calculating, and each stream is the sequence of operations performed in order, and between homogeneous turbulence and other stream can not be Out-of-order execution, also can be executed in parallel.Like this, a calculating of flowing and another data flowed can be made to transmit and to carry out simultaneously, thus improve the utilization factor of resource in GPU.
In the present embodiment, two streams are set up for the process of independent data and overlapped data respectively.Namely step S132 and S133 belongs to a stream, and step S135 belongs to another stream.While execution step S135, perform step S133, thus reach the object of the execution time of hiding step S133.Execution time due to step S135 is greater than execution time of step S133, and therefore step S134 follows step S135 performs, and step S136 follows step S134 is in the last execution of data processing.Step S134 and step S136 is subordinated to the execution which stream does not affect exchanges data hiding strategy of the present invention.For convenience of describing, needing subsequently on the Fig. 4 described, in the execution period that the execution period of step S134 and step S136 is incorporated to step S135, the execution period of step S134 and step S136 does not also do imbody.
Shown in composition graphs 3 and Fig. 4, data calculation process 311 can be divided into 411 and 412 two data calculation process.Wherein, data calculation process 411 is the S132 in Fig. 1, and data calculation process 412 is S135, S134 and S136 in Fig. 1.Similar data calculation process 311, data calculation process 313 can be divided into data calculation process 413 and 414; Data calculation process 321 can be divided into data calculation process 421 and 422; Data calculation process 323 can be divided into data calculation process 423 and 424.Data exchange process 312,314,322 and 324 in Fig. 3 does not do Further Division, is the data exchange process 431,432,433 and 434 in Fig. 4.
In the present embodiment, two streams are set up for independent data and overlapped data respectively.As shown in Figure 4, namely for GPU1 in time step 1, data calculation process 411 and data exchange process 431 belong to a stream, and data calculation process 412 belongs to a stream.For GPU1 in time step 1, first perform data calculation process 411, after data calculation process 411 completes, start the execution of data calculation process 412 and data exchange process 431 simultaneously.The time step 1 of such GPU1 is just for data calculation process 411 adds the execution time (being equivalent to the execution time of data calculation process 311 in Fig. 3) of data calculation process 412.The time step 1 of GPU1 in Fig. 3, the execution time of data exchange process 312 is hidden.The time step 2 of GPU1 and the time step 1 of GPU2 identical with the time step 1 of GPU1 with the situation of 2, just do not add to repeat here.
The concrete grammar of definition stream is establishment object, and calculates at startup kernel and imported into as parameter by this object when carrying out exchanges data, and what parameter was identical belongs to same stream, and what parameter was different belongs to not homogeneous turbulence.As shown in the object lesson of subordinate:
CudaStream_tstream1, stream2; // create two streams;
Overlap_area<<<st ream1>>>; // stream 1 calculates overlapped data;
Separate_area<<<s tream2>>>; // stream 2 calculates independent data;
Exchange_data<<<s tream1>>>; // stream 1 exchanges overlapped data;
Save_bound_wave<<LEssT.LTssT.L Tstream1>>>; // stream 1 stores wave field.
In an embody rule example of the present embodiment, the process of pre-Stack Reverse data comprises the continuation of source wavefield forward and receiver wave field backward extension.In a slice GPU, concrete calculation process as shown in Figure 5, first performs step S510, calculates overlapped data (stream 1).Then at execution step S520, independent data (stream 2) is calculated.
While execution step S520, first perform step S530, the result data in step S510 is copied to host memory; And perform step S540 immediately, utilize host memory to carry out exchanges data; Then perform step S550, by the data Replica after exchange to video memory, finally perform step S560, read the data in video memory.
Final execution step S570, the exchanges data result according to reading in the result of S520 and S560 carries out wave field superposition.And finally export wave field stack result or turn back to step S510, carry out the data computing of next round.Programming is as follows particularly for it:
For (0->T) // source wavefield forward continuation
Overlap_area (d_wave_s, stream1); // stream 1 carries out overlapping region source wavefield continuation and calculates
Separate_area (d_wave_s, stream2); // stream 2 carries out the continuation of isolated area source wavefield and calculates
Exchange_data (stream1); // stream 1 exchanges data
Save_bound_wave (stream1); The source wavefield border calculated is preserved by // stream 1
}
For (T->0) // receiver wave field backward extension
Recov_bound_wave (stream1); // stream 1 receives current time source wavefield border
Overlap_area (d_wave_r, stream1); // stream 1 calculates current time overlapping region receiver wave field
Overlap_area (d_wave_s, stream1); // stream 1 calculates current time overlapping region source wavefield
Separate_area (d_wave_s, stream2); // stream 2 calculates current time isolated area source wavefield
Separate_area (d_wave_r, stream2); // stream 2 calculates current time isolated area receiver wave field
Imaging<<<dimGrid, dimBlock>>> (d_rimage, d_wave_s, d_wave_r, stream2); // stream 2 carries out current time source wavefield and receiver wave field dependent imaging
Exchange_data (d_wave_s, stream1); // stream 1 exchanges source wavefield overlapping region data
Exchange_data (d_wave_r, stream1); // stream 1 exchanges receiver wave field overlapping region data
}
CudaMemcpy (RIMAGE, d_rimage, sizeof (float) * nx_apt*ny_apt*nz, cudaMemcpyDeviceToHost); // result of calculation is passed back in host memory by GPU video memory;
When after the result getting each data cell, just the result of all data cells can be combined thus obtain the result of pre-Stack Reverse data.And the final step S140 performed as shown in Figure 1, Output rusults step, exports to user by the result of the pre-Stack Reverse data finally obtained.
Next test with the real data in certain work area, as shown in table 1.Suppose two concrete data, process data 1 need use 4 GPU, and process data 2 need use 16 GPU.Calculate wavefield data amount and GPU communication throughput, estimate the time consumed respectively, then compare with the overall time of actual motion, if T.T. is suitable with computing time, show that call duration time is hidden; If T.T. differed greatly with computing time, show to communicate and calculate and unrealized hide.
As shown in table 1, in MPI technology, call duration time and computing time are that serial completes.Prior art only adopts MPI technology, process data 1 communication time 12.5, total run time 40s; Process data 2 communication time 49.3, total run time 161.2s.System of the present invention have employed the GPU card of the technology comprising DMA (or RDMA), adopt MPI and DMA combine with technique, communication efficiency is improved (processes data 1 communication time 5.2s, the total communication time 25.2s of process data 2,), total run time shortens (process data 1 total run time 40s, process data 2 total run time 161.2s).But because DMA (or RDMA) is still for serial completes, the total run time of GPU card is still longer.Adopt disposal system of the present invention and method (utilizing RDMA+Stream technology), process data 1 communication time 2.6, total run time 28.02s; Process data 2 communication time 12.4, total run time 112.03s.Total run time (28.02s and 112.03s) is suitable with pure computing time (28s and 112s).Demonstrate disposal system of the present invention and method to utilize multithreading stream calculation and parallel transmission technology to achieve call duration time to hide, overall operation efficiency improves about 9%.
TechnologyGPU card numberTrafficCommunication timeCalculated amountComputing timeTotal run time
MPI48GB12.5s112GB28s40s
MPI+DMA48GB5.2s112GB28s33.2s
RDMA+Stream48GB2.6s112GB28s28.02s
MPI1632GB49.3s448GB112s161.2s
MPI+DMA1632GB25.2s448GB112s137.2s
RDMA+Stream1632GB12.4s448GB112s112.03s
Table 1
To sum up, utilize disposal route of the present invention and system processes data, not only utilize GPU parallel processing mode to avoid GPU video memory to data volume and calculating quantitative limitation, also substantially increase data-handling efficiency by the parallel processing of many GPU.Further, exchanges data hiding strategy is also utilized to shorten the computing time of single GPU process data.
Although embodiment disclosed in this invention is as above, the embodiment that described content just adopts for the ease of understanding the present invention, and be not used to limit the present invention.Method of the present invention also can have other various embodiments.When not deviating from essence of the present invention, those of ordinary skill in the art are when making various corresponding change or distortion according to the present invention, but these change accordingly or are out of shape the protection domain that all should belong to claim of the present invention.

Claims (10)

CN201410513746.2A2014-09-292014-09-29System and method for processing prestack reverse-time offset dataPendingCN105510961A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201410513746.2ACN105510961A (en)2014-09-292014-09-29System and method for processing prestack reverse-time offset data

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201410513746.2ACN105510961A (en)2014-09-292014-09-29System and method for processing prestack reverse-time offset data

Publications (1)

Publication NumberPublication Date
CN105510961Atrue CN105510961A (en)2016-04-20

Family

ID=55719069

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201410513746.2APendingCN105510961A (en)2014-09-292014-09-29System and method for processing prestack reverse-time offset data

Country Status (1)

CountryLink
CN (1)CN105510961A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107918145A (en)*2016-10-102018-04-17中国石油化工股份有限公司The parallelization processing method and system of earthquake big gun energy
CN113465525A (en)*2021-06-212021-10-01贵州铭源新能源科技发展有限公司Geological disaster hidden danger point deformation characterization method based on stereoscopic vision

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20020022930A1 (en)*2000-02-252002-02-21Eric BoutsProcessing seismic data
CN101963916A (en)*2010-09-282011-02-02中国科学院地质与地球物理研究所Compilation processing method and device
CN102565854A (en)*2011-12-272012-07-11中国科学院地质与地球物理研究所Mass data GPU (graphics processing unit) wave equation reverse time migration imaging method
CN103675908A (en)*2012-09-212014-03-26中国石油化工股份有限公司Wave-equation reverse-time migration imaging method for mass-data graphic processing unit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20020022930A1 (en)*2000-02-252002-02-21Eric BoutsProcessing seismic data
CN101963916A (en)*2010-09-282011-02-02中国科学院地质与地球物理研究所Compilation processing method and device
CN102565854A (en)*2011-12-272012-07-11中国科学院地质与地球物理研究所Mass data GPU (graphics processing unit) wave equation reverse time migration imaging method
CN103675908A (en)*2012-09-212014-03-26中国石油化工股份有限公司Wave-equation reverse-time migration imaging method for mass-data graphic processing unit

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孔祥宁,等: "海量地震数据叠前逆时偏移的多GPU联合并行计算策略", 《石油物探》*
孔祥宁: "面向大规模RTM处理的CPU/GPU集群架构", 《江苏科技信息》*
王西文,等: "《地震资料的相对保真处理方法与应用》", 31 December 2012*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107918145A (en)*2016-10-102018-04-17中国石油化工股份有限公司The parallelization processing method and system of earthquake big gun energy
CN113465525A (en)*2021-06-212021-10-01贵州铭源新能源科技发展有限公司Geological disaster hidden danger point deformation characterization method based on stereoscopic vision

Similar Documents

PublicationPublication DateTitle
CN102047241B (en)Local and global data share
US11544060B2 (en)Two dimensional masked shift instruction
EP4071619A1 (en)Address generation method, related device and storage medium
CN104104888A (en)Parallel multi-core FPGA digital image real-time zooming processing method and device
CN115546620A (en) YOLO-based lightweight target detection network, method and electronic equipment
CN103701643A (en)Method and device for acquiring spliced wall system configurations and spliced wall system
CN102662623A (en)Parallel matrix multiplier based on single FPGA and implementation method thereof
CN103677761A (en)Quick satellite remote sensing data processing system
CN105510961A (en)System and method for processing prestack reverse-time offset data
CN114258538A (en)Explicit scheduling of on-chip operations
CN103677968A (en)Transaction processing method, transaction coordinator device and transaction participant device and system
WO2018208339A1 (en)Image processor with configurable number of active cores and supporting internal network
Palmer et al.Efficient Algorithms for Ghost Cell Updates on Two Classes of MPP Architectures.
CN115470176B (en)Computing device, method for implementing convolution operation by utilizing computing device and related product
CN112416053A (en)Synchronizing signal generating circuit and chip of multi-core architecture and synchronizing method and device
CN119106704A (en) Preprocessing methods, devices, equipment, media and products for graph neural networks
CN103914223B (en)The generation method and device at terminal applies interface
Lim et al.TidalMesh: Topology-Driven AllReduce Collective Communication for Mesh Topology
Boukerche et al.Optimized dynamic grid-based DDM protocol for large-scale distributed simulation systems
US11748933B2 (en)Method for performing shader occupancy for small primitives
CN115346099A (en)Image convolution method, chip, equipment and medium based on accelerator chip
CN113742266A (en)Integrated circuit device, electronic equipment, board card and calculation method
CN109284821B (en)Neural network arithmetic device
CN105335232A (en)Time sequence multiplexing-based FPGA resource optimization scheme
Bailey et al.Algorithm transformation for FPGA implementation

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
WD01Invention patent application deemed withdrawn after publication

Application publication date:20160420

WD01Invention patent application deemed withdrawn after publication

[8]ページ先頭

©2009-2025 Movatter.jp