技术领域technical field
本发明涉及信息领域,具体是指一种基于虚拟打印机的电子档案自动采集系统及采集方法。The invention relates to the field of information, in particular to an electronic file automatic collection system and collection method based on a virtual printer.
背景技术Background technique
随着学校信息化建设的不断发展,学校各部门电子文件的产生量日益巨大,大量的电子文件从各个管理与职能部门的业务系统中产生。电子文件和纸质文件的双套制管理模式,是档案工作的现行管理模式。比如说,教学是高校的主体,仅从教学这条线来讲,各层次学生基本信息及其在校其间的学籍信息,是教学类档案的重要组成部分。此类档案,在收集或采集时,均是每年均由本科生院、研究生院以及继续教育学院,通过自己的教务系统,向档案馆打印成纸质的来移交归档,但是其对应的电子文件,则都由档案馆在纸质文件归档后进行扫描加工,形成电子文件,再对外提供利用。由于数量大,整个扫描加工过程需要花相当的时间和人力才能完成,而且扫描过程中,对扫描质量的控制变得非常重要,对原始纸质档案的清晰度要求也高,这样扫描的结果才能符合利用的要求,其中的任何一个环节出现问题,都会对整体扫描质量直至最后的利用环节产生影响。With the continuous development of school information construction, the production of electronic documents in various departments of the school is increasing day by day, and a large number of electronic documents are generated from the business systems of various management and functional departments. The dual-set management mode of electronic files and paper files is the current management mode of archives work. For example, teaching is the main body of colleges and universities. From the perspective of teaching alone, the basic information of students at all levels and their student status information during school is an important part of teaching files. Such files, when collected or collected, are printed and handed over to the archives in paper form by the undergraduate school, graduate school, and continuing education college through their own educational affairs system every year, but the corresponding electronic files , are all scanned and processed by the archives after archiving the paper documents to form electronic documents, which are then provided for external use. Due to the large quantity, the entire scanning process takes considerable time and manpower to complete, and during the scanning process, the control of the scanning quality becomes very important, and the clarity requirements for the original paper files are also high, so that the scanning results can be If it meets the requirements of utilization, if there is a problem in any link, it will have an impact on the overall scanning quality until the final utilization link.
可见在实际归档前的电子文件不能被有效利用,导致档案馆收集纸质文件又须花费费用扫描成电子文件,成本高,效率低,而且扫描的质量又不能有效把控。It can be seen that the electronic files before the actual archiving cannot be effectively used, resulting in the archives collecting paper files and scanning them into electronic files at a cost, which is high in cost and low in efficiency, and the quality of scanning cannot be effectively controlled.
发明内容Contents of the invention
为了解决以上问题,本发明提供了一种可以电子化传输和管理的基于虚拟打印机的电子档案自动采集系统及采集方法。In order to solve the above problems, the present invention provides an electronic file automatic collection system and collection method based on a virtual printer that can be electronically transmitted and managed.
基于虚拟打印机的电子档案自动采集系统,包括,An automatic collection system for electronic files based on a virtual printer, including,
客户端,用于发出虚拟打印机打印指令,并将客户端的身份认证信息及请求打印的数据通过网络传输给服务器端;The client is used to issue a virtual printer printing command, and transmit the identity authentication information of the client and the data requested for printing to the server through the network;
服务器端,用于验证客户端用户的身份,接收客户端传送过来的请求打印的数据并生成电子文件。The server side is used to verify the identity of the client user, receive the printing request data sent by the client and generate an electronic file.
其中,in,
所述客户端将请求打印的数据经加密后通过网络传输;服务器端接收到请求打印的数据后进行解密、加工、数据分析,并生成通用格式的电子文件,每个电子文件带有一个识别码;The client encrypts the data requested to be printed and transmits it through the network; after receiving the data requested to be printed, the server side decrypts, processes, and analyzes the data, and generates an electronic file in a common format, each electronic file has an identification code ;
所述服务器端生成的电子文件加密,并通过网络传输传送到客户端;客户端接收到上述电子文件后,则提示用户保存或者选择打印机打印成纸质文件,所述纸质文件附带有和电子文件相同的识别码。The electronic file generated by the server is encrypted and transmitted to the client through network transmission; after the client receives the above-mentioned electronic file, it prompts the user to save or select a printer to print it into a paper file, and the paper file is attached with the electronic file. file with the same identifier.
优选的,所述识别码的形式为:一串文字、二维码图片、条形码图片,以及其它可支持特定设备识别的形式。Preferably, the form of the identification code is: a string of characters, a picture of a two-dimensional code, a picture of a barcode, and other forms that can support identification of specific devices.
优选的,所述虚拟打印机是一种软件程式,模拟实现打印机的功能,打印文件;所述身份认证采用的方式为:用户在客户端输入账号及密码,然后网络请求至服务器端进行认证用户输入的账号及密码是否正确,或使用用户的数字证书,或使用第三方身份认证平台;所述通用格式的电子文件为PDF格式;所述加解密/网络传输使用TCP/IP作为网络传输协议,同时使用HTTP协议作为数据传送协议。Preferably, the virtual printer is a software program that simulates the function of a printer and prints documents; the identity authentication method is as follows: the user enters an account number and password at the client end, and then the network requests to the server end to authenticate the user input Whether the account number and password are correct, or use the user's digital certificate, or use a third-party identity authentication platform; the electronic file in the general format is in PDF format; the encryption and decryption/network transmission uses TCP/IP as the network transmission protocol, and at the same time Use the HTTP protocol as the data transfer protocol.
优选的,所述加工、数据分析为系统针对档案归档业务需求提供的以下处理功能:OCR识别与模式匹配功能、识别码生成功能和服务器端文件存储及管理功能。Preferably, the processing and data analysis are the following processing functions provided by the system for archives business requirements: OCR identification and pattern matching functions, identification code generation functions, and server-side file storage and management functions.
优选的,服务器端文件存储及管理功能包括:Preferably, the server-side file storage and management functions include:
(1)支持按部门定义文件夹,使得电子文件自动归属到相应的部门文件夹下;(1) Support for defining folders by department, so that electronic files are automatically assigned to the corresponding department folders;
(2)支持自定义电子文件自动生成名称;(2) Support automatic generation of names for custom electronic files;
(3)支持按元数据项进行检索与二次检索;(3) Support search and secondary search by metadata item;
(4)支持根据电子文件匹配的模板进行自动分类,如发文、学籍卡等;(4) Support automatic classification based on templates matched with electronic documents, such as posting documents, student status cards, etc.;
(5)支持文件压缩与加密存储;(5) Support file compression and encrypted storage;
(6)支持二维码检索;(6) Support QR code retrieval;
(7)支持电子公章和电子签名章的应用;(7) Support the application of electronic official seal and electronic signature seal;
(8)支持WS接口方式及XML方式电子文件批量导出;(8) Support batch export of electronic files in WS interface mode and XML mode;
(9)支持电子文件的全量备份、增量备份、异机备份等多种形式;(9) Support various forms such as full backup, incremental backup, and different machine backup of electronic files;
(10)支持对特异性数据来源的元数据分析功能;(10) Support the metadata analysis function for specific data sources;
(11)支持管理与上传电子文件模板;(11) Support management and upload of electronic file templates;
(12)支持扩展数据接口,使与纸质材料二维码相对应的电子文件、归档附加信息等传递到“预立卷系统”或“数档系统”。(12) Support the expansion of the data interface, so that the electronic files corresponding to the two-dimensional code of the paper material, the additional information for filing, etc. can be transmitted to the "pre-file system" or "data file system".
另外本发明还提供了一种电子档案采集方法。In addition, the invention also provides an electronic file collection method.
基于虚拟打印机的电子档案自动采集方法,包括如下步骤:The method for automatically collecting electronic files based on a virtual printer comprises the following steps:
1)用户在客户端使用学籍系统,或OA系统,或其它业务系统打印网页、电子表格或电子文件时,选择虚拟打印机进行打印;1) When the user uses the student status system, OA system, or other business systems to print web pages, electronic forms or electronic documents on the client side, he or she can choose a virtual printer to print;
2)虚拟打印机接收到打印请求后,判断当前用户身份是否已认证,如果未认证,则提示用户首先将身份认证信息通过网络传输到服务器端进行身份认证;2) After the virtual printer receives the print request, it judges whether the current user identity has been authenticated. If not, the user is prompted to first transmit the identity authentication information to the server through the network for identity authentication;
3)身份认证通过后,客户端将请求打印的数据经加密以及网络传输,传送到服务器端;3) After the identity authentication is passed, the client sends the requested printing data to the server through encryption and network transmission;
4)服务器端接收到请求的数据后进行解密、加工、数据分析,并生成通用格式的电子文件,所述电子文件带有一个识别码;4) After receiving the requested data, the server side performs decryption, processing, and data analysis, and generates an electronic file in a general format, and the electronic file has an identification code;
5)服务器端将上述电子文件加密以及网络传输,传送到客户端;5) The server side encrypts and transmits the above-mentioned electronic files to the client terminal;
6)客户端接收到上述电子文件后,则提示用户保存或者选择实际的打印机进行打印成带有识别码的纸质文件。6) After receiving the above-mentioned electronic file, the client prompts the user to save or select an actual printer to print it into a paper file with an identification code.
其中,in,
所述识别码的形式为:一串文字、二维码图片、条形码图片,以及其它可支持特定设备识别的形式;The form of the identification code is: a string of text, a two-dimensional code picture, a barcode picture, and other forms that can support specific device identification;
所述虚拟打印机是一种软件程式,模拟实现打印机的功能,打印文件;Described virtual printer is a kind of software program, realizes the function of printer by simulation, prints document;
所述身份认证采用的方式为:用户在客户端输入账号及密码,然后网络请求至服务器端进行认证用户输入的账号及密码是否正确,或使用用户的数字证书,或使用第三方身份认证平台;The method adopted for the identity authentication is: the user enters the account number and password at the client end, and then the network requests to the server end to verify whether the account number and password entered by the user are correct, or use the user's digital certificate, or use a third-party identity authentication platform;
所述通用格式的电子文件为PDF格式;The electronic file in the common format is in PDF format;
所述加解密/网络传输使用TCP/IP作为网络传输协议,同时使用HTTP协议作为数据传送协议;The encryption and decryption/network transmission uses TCP/IP as the network transmission protocol, and uses the HTTP protocol as the data transmission protocol simultaneously;
所述加工、数据分析为系统针对档案归档业务需求提供的以下处理功能:OCR识别与模式匹配功能、识别码生成功能和服务器端文件存储及管理功能。优选的,所述服务器端文件存储及管理功能包括:The processing and data analysis are the following processing functions provided by the system for archives business needs: OCR recognition and pattern matching functions, identification code generation functions, and server-side file storage and management functions. Preferably, the server-side file storage and management functions include:
(1)支持按部门定义文件夹,使得电子文件自动归属到相应的部门文件夹下;(1) Support for defining folders by department, so that electronic files are automatically assigned to the corresponding department folders;
(2)支持自定义电子文件自动生成名称;(2) Support automatic generation of names for custom electronic files;
(3)支持按元数据项进行检索与二次检索;(3) Support search and secondary search by metadata item;
(4)支持根据电子文件匹配的模板进行自动分类,如发文、学籍卡等;(4) Support automatic classification based on templates matched with electronic documents, such as posting documents, student status cards, etc.;
(5)支持文件压缩与加密存储;(5) Support file compression and encrypted storage;
(6)支持二维码检索;(6) Support QR code retrieval;
(7)支持电子公章和电子签名章的应用;(7) Support the application of electronic official seal and electronic signature seal;
(8)支持WS接口方式及XML方式电子文件批量导出;(8) Support batch export of electronic files in WS interface mode and XML mode;
(9)支持电子文件的全量备份、增量备份、异机备份等多种形式;(9) Support various forms such as full backup, incremental backup, and different machine backup of electronic files;
(10)支持对特异性数据来源的元数据分析功能;(10) Support the metadata analysis function for specific data sources;
(11)支持管理与上传电子文件模板;(11) Support management and upload of electronic file templates;
(12)支持扩展数据接口,使与纸质材料二维码相对应的电子文件、归档附加信息等传递到“预立卷系统”或“数档系统”。(12) Support the expansion of the data interface, so that the electronic files corresponding to the two-dimensional code of the paper material, the additional information for filing, etc. can be transmitted to the "pre-file system" or "data file system".
基于虚拟打印的电子档案自动化采集系统由两部分组成。The automatic collection system of electronic archives based on virtual printing consists of two parts.
第一部分为,客户端程式。该程式关键实现虚拟打印机。用户在打印网页、电子表格或电子文件时,可以选择该虚拟打印机进行打印。通过虚拟打印机可回避对业务系统的技术介入,不存在考虑接口的问题,通用性强,做到与业务系统的无关性。The first part is the client program. The program mainly realizes the virtual printer. When the user prints a webpage, electronic form or electronic file, he can select the virtual printer to print. The technical intervention in the business system can be avoided through the virtual printer, and there is no problem of considering the interface. It has strong versatility and has nothing to do with the business system.
第二部分为,服务器端程式。该程式关键实现验证客户端用户的身份,以及接收客户端传送过来的电子文件。The second part is the server-side program. The key of this program is to verify the identity of the client user and receive the electronic file sent by the client.
不论在学籍材料的双套制归档,还是在学校OA系统每年的来文,以及其它学校业务系统中产生的归档电子文件归档,基于虚拟打印的电子档案自动化采集系统都将非常有效解决上述存在的困境。既简化并缩短了从档案归档到提供利用的操作复杂度及时间周期,同时又归避了由对纸质档案数字化而产生的所有可能产生的质量问题。Regardless of the double-set filing of student status materials, or the annual submissions of the school OA system, as well as the filing of electronic file filings generated in other school business systems, the electronic file automatic collection system based on virtual printing will be very effective in solving the above problems. dilemma. It not only simplifies and shortens the operation complexity and time period from archives filing to providing utilization, but also avoids all possible quality problems caused by digitizing paper archives.
附图说明Description of drawings
图1是实施例1的流程示意图。Fig. 1 is the schematic flow chart of embodiment 1.
图2是实施例2的流程示意图。Fig. 2 is the schematic flow chart of embodiment 2.
图3是本发明实施例的应用模型示意图。Fig. 3 is a schematic diagram of an application model of an embodiment of the present invention.
具体实施方式Detailed ways
下面结合具体实施例进一步阐述本发明,应理解,以下实施例仅用于说明本发明而不用于限制本发明的保护范围。The present invention will be further described below in conjunction with specific examples. It should be understood that the following examples are only used to illustrate the present invention and are not intended to limit the protection scope of the present invention.
实施例1Example 1
如图1所示,第一种解决方案如下:As shown in Figure 1, the first solution is as follows:
(1)用户在使用学籍系统,或OA系统,或其它业务系统打印网页、电子表格或电子文件时,选择虚拟打印机进行打印。(1) When users use the student status system, OA system, or other business systems to print web pages, electronic forms or electronic documents, they can choose a virtual printer to print.
(2)虚拟打印机接收到打印请求后,判断当前用户身份是否已认证,如果未认证,则提示用户首先进行身份认证。(2) After receiving the print request, the virtual printer judges whether the current user identity has been authenticated, and if not, prompts the user to perform identity authentication first.
(3)身份认证通过后,客户端程式将请求打印的数据经加密以及网络传输,传送到服务器端程式;(3) After the identity authentication is passed, the client program sends the data requested to be printed to the server program through encryption and network transmission;
(4)服务器端程式接收到请求的数据后进行解密、加工、数据分析,并生成通用格式的电子文件;(4) After the server-side program receives the requested data, it decrypts, processes, and analyzes the data, and generates an electronic file in a common format;
(5)服务器端程式将上述电子文件加密以及网络传输,传送到客户端程式;(5) The server-side program encrypts and transmits the above-mentioned electronic documents to the client-side program;
(6)客户端程式接收到上述电子文件后,则提示用户保存或者选择实际的打印机进行打印成纸质文件;(6) After receiving the above-mentioned electronic documents, the client program prompts the user to save or select an actual printer to print them into paper documents;
实施例2Example 2
如图2所示第二种解决方案如下:As shown in Figure 2, the second solution is as follows:
(1)用户在使用学籍系统,或OA系统,或其它业务系统打印网页、电子表格或电子文件时,选择虚拟打印机进行打印。(1) When users use the student status system, OA system, or other business systems to print web pages, electronic forms or electronic documents, they can choose a virtual printer to print.
(2)虚拟打印机接收到打印请求后,判断当前用户身份是否已认证,如果未认证,则提示用户首先进行身份认证。(2) After receiving the print request, the virtual printer judges whether the current user identity has been authenticated, and if not, prompts the user to perform identity authentication first.
(3)身份认证通过后,客户端程式将请求打印的数据进行加工、数据分析,并生成通用格式的电子文件;(3) After the identity authentication is passed, the client program will process and analyze the data requested to be printed, and generate an electronic file in a common format;
(4)客户端程式将上述电子文件以及分析后形成的数据经加密以及网络传输,传送到服务器端程式;(4) The client program encrypts and transmits the above-mentioned electronic documents and the data formed after analysis to the server program;
(5)服务器端程式接收到上述电子文件以及分析数据后进行解密,再处理;(5) After receiving the above-mentioned electronic documents and analysis data, the server-side program decrypts them and processes them again;
(6)客户端程式提示用户保存上述电子文件,或者选择实际的打印机进行打印上述电子文件成纸质文件;(6) The client program prompts the user to save the above-mentioned electronic files, or select an actual printer to print the above-mentioned electronic files into paper files;
通过上述两种解决方法,用户非常方便的完成了实现电子文件的自动归档,以及形成相应的纸质文件。Through the above two solutions, the user can realize the automatic archiving of electronic files and the formation of corresponding paper files very conveniently.
(1)识别码(1) Identification code
为了档案管操作人员收集到纸质文件后,能够快速实现纸质文件与电子文件的归档过程,因此通常在生成通用格式的电子文件时,自动为该电子文件加入一个识别码。In order for archives operators to quickly realize the archiving process of paper documents and electronic documents after collecting paper documents, an identification code is automatically added to the electronic document when generating an electronic document in a common format.
识别码的形式可以是:一串文字、二维码图片、条形码图片,以及其它可支持特定设备识别的形式。The form of the identification code can be: a string of text, a picture of a two-dimensional code, a picture of a barcode, and other forms that can support specific device identification.
例如文字识别码,档案馆人员在归档纸质材料时直接输入这些文字即可快速检索到相应的电子文件。二维码、条形码图片,档案馆人员在归档纸质材料时可使用支持扫描二维码、条形码的设备快速检索到相应的电子文件。For example, the text identification code, the archives personnel can quickly retrieve the corresponding electronic files by directly inputting these texts when archiving paper materials. QR codes and barcode pictures, archives personnel can use devices that support scanning QR codes and barcodes to quickly retrieve corresponding electronic files when archiving paper materials.
(2)虚拟打印机(2) Virtual printer
虚拟打印机,就是虚拟的打印机,它是一种软件程式,模拟实现打印机的功能,打印文件。虚拟打印机同真实打印机一样,安装完毕后,打开“控制面板”中的“打印机和传真”,就会看到所安装的虚拟打印机,可以像使用一台打印机一样使用它们。鼠标双击将其打开,可以对其“打印首选项”和“属性”进行修改,从而设定是否共享、可使用时间、是否后台打印和优先级,以及纸张大小、版式安排等。它们同样能截获所有Windows程序的打印操作,或模拟打印效果,或完成某一特殊功能。A virtual printer is a virtual printer. It is a software program that simulates the functions of a printer and prints files. The virtual printer is the same as the real printer. After the installation is complete, open the "Printers and Faxes" in the "Control Panel", and you will see the installed virtual printers, and you can use them like a printer. Double-click it to open it, and you can modify its "Printing Preferences" and "Properties", so as to set whether to share, available time, whether to print in the background and priority, as well as paper size, layout arrangement, etc. They can also intercept the printing operation of all Windows programs, or simulate the printing effect, or complete a certain special function.
(3)身份认证(3) Identity authentication
身份认证通常采用以下几种方式:Identity authentication usually adopts the following methods:
i.用户输入账号及密码,然后网络请求至服务器端程式进行认证用户输入的账号及密码是否正确;i. The user enters the account number and password, and then the network requests to the server-side program to verify whether the account number and password entered by the user are correct;
ii.使用用户的数字证书;ii. Use the user's digital certificate;
iii.使用第三方身份认证平台;iii. Use a third-party identity authentication platform;
iv.其它身份认证方法;iv. Other identity authentication methods;
(4)通用格式的电子文件(4) Electronic files in common format
系统默认使用PDF作为通用格式。PDF全称Portable Document Format,译为"便携文档格式",是一种电子文件格式。这种文件格式与操作系统平台无关,也就是说,PDF文件不管是在Windows,Unix还是在苹果公司的Mac OS操作系统中都是通用的。这一性能使它成为在Internet上进行电子文档发行和数字化信息传播的理想文档格式。同时PDF文件是以PostScript语言图象模型为基础,无论在哪种打印机上都可保证精确的颜色和准确的打印效果,即PDF会忠实地再现原稿的每一个字符、颜色以及图象。The system uses PDF as the common format by default. The full name of PDF is Portable Document Format, translated as "Portable Document Format", which is an electronic file format. This file format has nothing to do with the operating system platform, that is to say, PDF files are universal whether they are in Windows, Unix or Apple's Mac OS operating system. This performance makes it an ideal document format for electronic document distribution and digital information dissemination on the Internet. At the same time, PDF files are based on the PostScript language image model, which can guarantee accurate colors and accurate printing effects no matter what kind of printer is used, that is, PDF will faithfully reproduce every character, color and image of the original manuscript.
(5)加解密/网络传输(5) Encryption and decryption/network transmission
使用TCP/IP作为网络传输协议,同时使用HTTP协议作为数据传送协议,而且使用HTTP协议可以有效解决绕开防火墙。HTTPS是在HTTP之上的安全超文本传输协议。HTTPS应用了Netscape的完全套接字层(SSL)作为HTTP应用层的子层。SSL使用40位关键字作为RC4流加密算法。同时HTTPS和SSL也支持使用X.509数字认证。Use TCP/IP as the network transmission protocol, and use the HTTP protocol as the data transmission protocol, and use the HTTP protocol to effectively solve the problem of bypassing the firewall. HTTPS is Hypertext Transfer Protocol Secure over HTTP. HTTPS uses Netscape's Full Sockets Layer (SSL) as a sublayer of the HTTP application layer. SSL uses a 40-bit keyword as the RC4 stream encryption algorithm. At the same time, HTTPS and SSL also support the use of X.509 digital certificates.
SSL协议位于TCP/IP协议与各种应用层协议之间,为数据通讯提供安全支持。SSL协议可分为两层:SSL记录协议(SSL Record Protocol):它建立在可靠的传输协议(如TCP)之上,为高层协议提供数据封装、压缩、加密等基本功能的支持。SSL握手协议(SSLHandshake Protocol):它建立在SSL记录协议之上,用于在实际的数据传输开始前,通讯双方进行身份认证、协商加密算法、交换加密密钥等。The SSL protocol is located between the TCP/IP protocol and various application layer protocols, providing security support for data communication. The SSL protocol can be divided into two layers: SSL Record Protocol (SSL Record Protocol): It is built on a reliable transmission protocol (such as TCP) and provides support for basic functions such as data encapsulation, compression, and encryption for high-level protocols. SSL Handshake Protocol (SSLHandshake Protocol): It is built on the SSL record protocol and is used for identity authentication, negotiation of encryption algorithms, and exchange of encryption keys before the actual data transmission begins.
因此,使用HTTPS为客户端与服务器端之间的网络传输起到如下作用:Therefore, using HTTPS plays the following roles for network transmission between the client and the server:
I认证用户和服务器,确保数据发送到正确的客户机和服务器;I authenticate users and servers to ensure that data is sent to the correct client and server;
Ii加密数据以防止数据中途被窃取;Ii encrypts data to prevent data from being stolen;
Iii维护数据的完整性,确保数据在传输过程中不被改变。Iii maintains the integrity of the data, ensuring that the data is not altered during transmission.
(6)加工、数据分析(6) Processing and data analysis
系统针对档案归档业务需求提供以下几个主要的处理功能。The system provides the following main processing functions for the archives business requirements.
A.OCR识别与模式匹配A. OCR recognition and pattern matching
通过虚拟打印机接管请求打印的电子原件,如果为图像,则经过OCR识别技术自动识别出中文,同时结合模式库对这些中文进行抽取元数据。如果为其它格式的文档,则结合模式库对这些文档进行抽取元数据。The electronic original requested to be printed is taken over by the virtual printer. If it is an image, it will automatically recognize Chinese through OCR recognition technology, and at the same time extract metadata from these Chinese in combination with the pattern library. If it is a document in other formats, the metadata is extracted from these documents in combination with the pattern library.
OCR识别可以使用Tesseract的OCR引擎。Tesseract的OCR引擎最先由HP实验室于1985年开始研发,至1995年时已经成为OCR业内最准确的三款识别引擎之一。Tesseract目前已作为开源项目发布在Google Project。OCR recognition can use Tesseract's OCR engine. Tesseract's OCR engine was first developed by HP Labs in 1985, and by 1995 it had become one of the three most accurate recognition engines in the OCR industry. Tesseract is currently published as an open source project on Google Project.
模式库主要有几种类型。一种类型是在识别出中文的基础上,根据出现的前后顺序建立的模板。另一种类型是基于元数据出现的XY位置而建立的模板。另外就是根据特定文档通过外挂程式进行准确抽取元数据。通过这些模板对电子原件进行元数据分析,抽取元数据信息。如发文,则元数据有发文的主题、发文单位、文号、日期等。如成绩信息,则有学生姓名、年级、班级、学号、学期等。还有一种类型是基于语义进行识别。There are several main types of pattern libraries. One type is a template established according to the order of appearance on the basis of recognizing Chinese. Another type is a template based on the XY position where the metadata occurs. The other is to accurately extract metadata through plug-ins based on specific documents. Perform metadata analysis on electronic originals through these templates, and extract metadata information. If a document is published, the metadata includes the subject of the document, the issuing unit, document number, date, etc. Such as grade information, there are student names, grades, classes, student numbers, semesters, etc. There is another type of recognition based on semantics.
B.生成识别码B. Generate an identification code
在生成通用格式的电子文件时,自动为该电子文件加入一个识别码。识别码的形式支持文字、二维码、条形码等。When an electronic file in a common format is generated, an identification code is automatically added to the electronic file. The form of the identification code supports text, two-dimensional code, bar code, etc.
C.服务器端文件存储及管理C. Server-side file storage and management
支持按部门定义文件夹,使得电子文件自动归属到相应的部门文件夹下;Support for defining folders by department, so that electronic files are automatically assigned to the corresponding department folders;
支持自定义电子文件自动生成名称;Support automatic generation of names for custom electronic files;
支持按元数据项进行检索与二次检索;Support retrieval and secondary retrieval by metadata item;
支持根据电子文件匹配的模板进行自动分类,如发文、学籍卡等;Supports automatic classification based on templates matched with electronic documents, such as posting documents, student status cards, etc.;
支持文件压缩与加密存储;Support file compression and encrypted storage;
支持二维码检索;Support QR code retrieval;
支持电子公章和电子签名章的应用;Support the application of electronic official seal and electronic signature seal;
支持WS接口方式及XML方式电子文件批量导出;Support batch export of electronic files in WS interface mode and XML mode;
支持电子文件的全量备份、增量备份、异机备份等多种形式;Support various forms such as full backup, incremental backup, and different machine backup of electronic files;
支持对特异性数据来源的元数据分析功能;Support metadata analysis functions for specific data sources;
支持管理与上传电子文件模板(即为模式库);Support management and upload of electronic file templates (that is, pattern libraries);
支持扩展数据接口,使与纸质材料二维码相对应的电子文件、归档附加信息等传递到“预立卷系统”或“数档系统”。Support extended data interface, so that the electronic files corresponding to the QR codes of paper materials, additional information for filing, etc. can be transmitted to the "pre-volume system" or "digital file system".
D.其它功能模块D. Other functional modules
如图3所示描述了实际兼职档案员操作的应用模型。兼职档案员针对需打印生成的纸质归档原件,通过系统上述步骤形成相应的电子文件及纸质材料。另外针对对于已存在的纸质归档实体与对应电子文件,通过适度改进业务系统,使系统能够获取相应目录数据信息及其电子文件,这些作为主体归档材料,同时业务办理过程形成的过程信息与过程文件,仍然可以通过系统上述步骤,打印生成业务过程纸质原件与电子文件。As shown in Figure 3, the application model of the actual part-time archivist operation is described. The part-time archivist will form corresponding electronic files and paper materials through the above steps of the system for the original paper archives that need to be printed. In addition, for the existing paper archiving entities and corresponding electronic files, by moderately improving the business system, the system can obtain the corresponding catalog data information and electronic files, which are used as the main archiving materials, and the process information and process formed during the business process Documents can still be printed to generate paper originals and electronic documents in the business process through the above steps of the system.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410202184.XACN103973692B (en) | 2014-05-13 | 2014-05-13 | Electronic record automated collection systems based on virtual printing and acquisition method |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410202184.XACN103973692B (en) | 2014-05-13 | 2014-05-13 | Electronic record automated collection systems based on virtual printing and acquisition method |
| Publication Number | Publication Date |
|---|---|
| CN103973692A CN103973692A (en) | 2014-08-06 |
| CN103973692Btrue CN103973692B (en) | 2018-09-14 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201410202184.XAExpired - Fee RelatedCN103973692B (en) | 2014-05-13 | 2014-05-13 | Electronic record automated collection systems based on virtual printing and acquisition method |
| Country | Link |
|---|---|
| CN (1) | CN103973692B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104200330A (en)* | 2014-09-11 | 2014-12-10 | 广西桂能软件有限公司 | Management method of digital archives in power industry |
| CN105187220A (en)* | 2015-10-26 | 2015-12-23 | 南威软件股份有限公司 | Centralized script material printing management system |
| CN105654273A (en)* | 2015-12-29 | 2016-06-08 | 中国科学院信息工程研究所 | Barcode technology-based electronic document management system and method |
| CN105760903A (en)* | 2016-04-26 | 2016-07-13 | 上海易能信息技术有限公司 | Two-dimensional code printing and reading methods and apparatuses for legal document of case |
| CN107657760A (en)* | 2017-08-22 | 2018-02-02 | 珠海赛纳打印科技股份有限公司 | Image formation system and image forming method |
| CN109428879A (en)* | 2017-09-05 | 2019-03-05 | 北京立思辰计算机技术有限公司 | A kind of printing safety control system and method |
| CN108322526A (en)* | 2018-01-23 | 2018-07-24 | 上海净阅科技有限公司 | A kind of transmission method of document |
| CN109766062A (en)* | 2018-12-15 | 2019-05-17 | 深圳壹账通智能科技有限公司 | Web page contents Method of printing, device, equipment and storage medium based on PHP service |
| CN109710900A (en)* | 2018-12-25 | 2019-05-03 | 中电福富信息科技有限公司 | A kind of system and method for realizing document unified management in server end |
| CN109862093A (en)* | 2019-01-30 | 2019-06-07 | 上海今创信息技术有限公司 | A kind of intelligent cloud acquisition method and system based on medical profession system |
| CN110738278A (en)* | 2019-08-30 | 2020-01-31 | 福建亿榕信息技术有限公司 | two-dimensional code file compilation method and system |
| CN112748891B (en)* | 2020-12-31 | 2023-05-23 | 同智伟业软件股份有限公司 | Court document printing management system and method |
| CN113377303A (en)* | 2021-06-16 | 2021-09-10 | 苏州博瑞凯德信息技术有限公司 | Printing method, printing device, client and server |
| CN113742286A (en)* | 2021-08-31 | 2021-12-03 | 远光软件股份有限公司 | Archival data archive format file generation method, computer device, and computer-readable storage medium |
| CN114138709A (en)* | 2021-11-18 | 2022-03-04 | 广州明动软件股份有限公司 | Integrated management of library and office based on cloud archives integration platform |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101124766A (en)* | 2003-12-22 | 2008-02-13 | 英高系统有限公司 | Method for creating an electronically signed document |
| CN101615268A (en)* | 2009-07-31 | 2009-12-30 | 北京华思维泰克科技有限公司 | Method of electronic drawings and archives being collected, managing by digital label and the system that realizes this method |
| CN102024002A (en)* | 2009-09-10 | 2011-04-20 | 上海中信信息发展股份有限公司 | Safe storage method and system of filing of electronic documents |
| CN102201040A (en)* | 2010-03-22 | 2011-09-28 | 北京大学 | Method, system and device for processing electronic documents |
| CN102223374A (en)* | 2011-06-22 | 2011-10-19 | 熊志海 | Third-party authentication security protection system and third-party authentication security protection method based on online security protection of electronic evidence |
| CN102332980A (en)* | 2011-09-14 | 2012-01-25 | 福建伊时代信息科技股份有限公司 | Method and system for managing electronic file |
| CN102833449A (en)* | 2012-07-27 | 2012-12-19 | 富士施乐实业发展(上海)有限公司 | Automatic document processing method based on multifunctional machine |
| CN103001768A (en)* | 2011-09-13 | 2013-03-27 | 东方钢铁电子商务有限公司 | Electronic quality certificate and generation and verification method thereof |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101124766A (en)* | 2003-12-22 | 2008-02-13 | 英高系统有限公司 | Method for creating an electronically signed document |
| CN101615268A (en)* | 2009-07-31 | 2009-12-30 | 北京华思维泰克科技有限公司 | Method of electronic drawings and archives being collected, managing by digital label and the system that realizes this method |
| CN102024002A (en)* | 2009-09-10 | 2011-04-20 | 上海中信信息发展股份有限公司 | Safe storage method and system of filing of electronic documents |
| CN102201040A (en)* | 2010-03-22 | 2011-09-28 | 北京大学 | Method, system and device for processing electronic documents |
| CN102223374A (en)* | 2011-06-22 | 2011-10-19 | 熊志海 | Third-party authentication security protection system and third-party authentication security protection method based on online security protection of electronic evidence |
| CN103001768A (en)* | 2011-09-13 | 2013-03-27 | 东方钢铁电子商务有限公司 | Electronic quality certificate and generation and verification method thereof |
| CN102332980A (en)* | 2011-09-14 | 2012-01-25 | 福建伊时代信息科技股份有限公司 | Method and system for managing electronic file |
| CN102833449A (en)* | 2012-07-27 | 2012-12-19 | 富士施乐实业发展(上海)有限公司 | Automatic document processing method based on multifunctional machine |
| Publication number | Publication date |
|---|---|
| CN103973692A (en) | 2014-08-06 |
| Publication | Publication Date | Title |
|---|---|---|
| CN103973692B (en) | Electronic record automated collection systems based on virtual printing and acquisition method | |
| CN103985073A (en) | Automatic electronic file collection system based on virtual printing and use method thereof | |
| US20240129130A1 (en) | System And Method For Authenticating Digitally Signed Documents | |
| CN101167297B (en) | Method and apparatus for adding signature information to electronic documents | |
| US20010034835A1 (en) | Applied digital and physical signatures over telecommunications media | |
| CN100458819C (en) | Image processing method, image processing apparatus, and storage medium | |
| US9137405B2 (en) | System for creating certified document copies | |
| CN1885327B (en) | Image output system having image log recording function, and log recording method in image output system | |
| BRPI0615658A2 (en) | directed signature method, means and workflow system | |
| CN116842909B (en) | Intelligent signature method and system | |
| US7970169B2 (en) | Secure stamping of multimedia document collections | |
| JP5046984B2 (en) | Information processing apparatus, information processing method, and program | |
| JP2012008942A (en) | Document management system, document management device, document management method and document management program | |
| KR100858103B1 (en) | Template-based Web Document Publishing System | |
| TWM520159U (en) | Device for generating and identifying electronic document containing electronic authentication and paper authentication | |
| KR20130011868A (en) | Method for generating electronic document available at mobile devices and device of producing the same | |
| KR20090112840A (en) | Method and system for managing document copyright and recording medium therefor | |
| CN104426898A (en) | Server, terminal, digital rights management system and digital rights management method | |
| CA3074806A1 (en) | A system and method for authenticating digitally signed documents | |
| JP6352221B2 (en) | Web page print document verification system | |
| TWI595380B (en) | Device for generating or verifying authenticate electronic document with electronic and paper certification and method thereof | |
| US10887476B1 (en) | Use of published electronic documents to enable automated communication between readers and authors | |
| KR20040040271A (en) | Method for proving contents of printout and apparatus thereof | |
| JP6248494B2 (en) | Information processing apparatus, information processing method, and information processing program | |
| KR20210093621A (en) | PDF Publishing System for Web and SNS contents based on templates |
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20180914 | |
| CF01 | Termination of patent right due to non-payment of annual fee |