A kind of audit auditing method and system based on text mining analysis technologyTechnical field
The present invention relates to Audit data excavation applications, specifically, being that a kind of audit based on Text Mining Technology is checkedMethod and system.
Background technology
With the arriving in big data epoch, the audit economic supervision department comprehensive as one will also face what it broughtHuge challenge produces the unstructured data of magnanimity in business system, only manually audits, effect an utterly inadequate amount,The blind area for having become audit operations for the audit of unstructured data, it is unstructured using high-tech means and tool realizationThe demand that the analysis of data provides data supporting with excavation for audit operations is extremely urgent.
Text mining from non-structured text information for obtaining user's information interested or useful, text miningCover multiple technologies, including information extraction, information retrieval, natural language processing and data mining technology, it is mainly used forFrom script without extracting unknown knowledge in the text that uses.
Existing audit is typically all artificial extraction data, is susceptible to the problem of data are slipped, and currently auditingUnstructured data has not yet been formed the audit system of automation in field.
Invention content
Present invention solves the technical problem that being:Audit field forms the audit system of automation not yet at present.
The solution that the present invention solves its technical problem is:On the one hand, a kind of examining based on text mining analysis technologyCount auditing method, including step:
S1, enterprise's contract dataset is extracted from enterprise's contract management system, and extract contract key message, carry out structuringStorage;
S2, the contract key message of extraction and fund flow data are checked.
Further, enterprise's contract dataset includes contract documents, the document formats of the contract documents be pdf, doc,Docx is any.
Further, the contract key message includes:Contract payment information, contract total price, down payment time, firstSecondary Payment Amount, second of payment time, second of Payment Amount.
Further, including:In step sl:
S11, technology reading contract documents are read using document;
S12, contract key message extracting rule library is formulated, and institute is extracted by text extraction techniques using the rule baseState contract key message;
S13, tables of data is established, the contract key message that step S12 is extracted is stored in tables of data.
Further, including:In step s 2:
S21, fund flow data is extracted from financial system;
S22, by the fund flow data and the contract key message of step S12 extraction according to the audit regulation formulated in advance intoRow matching;
S23, it underproof contract key message will be matched is grouped.
On the other hand, a kind of audit audit system based on text mining analysis technology is provided, including:Data acquisition moduleModule is checked in block, contract key message abstraction module, audit, and the data acquisition module is for extracting enterprise's contract dataset;InstituteContract key message abstraction module is stated for extracting contract key message from enterprise's contract dataset;Mould is checked in the auditBlock by the audit regulation formulated in advance for matching the contract key message with fund flow data.
Further, this system further includes front end display module, and the front end display module includes:For showing the dataAcquisition module extraction enterprise's contract dataset, for showing the contract key message abstraction module from enterprise's contract datasetThe contract key message of extraction.
The beneficial effects of the invention are as follows:On the one hand, audit auditing method provided by the present invention is by text mining skillArt extracts contract key message from enterprise's contract automatically, forms the fund that bank returns in structural data, with financial systemFlow data is compared, and by formulating audit issues rule, finding audit issues and being grouped, audit issues are sorted out in realization, withJust concentration audit is carried out to same problems.Compared with previous enterprise's contract audit data method, the invention has the advantages that:One, automatically extract the contract key message in enterprise's contract, compared with previous artificial extraction, save prodigious people's financial resources atThis;Two, the fund flow data that contract dataset is returned with bank is checked automatically, it can be found that any unmatched problem,It avoids missing some problems because of human negligence.
On the other hand, the present invention also provides the systems for executing this method.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodimentAttached drawing is briefly described.Obviously, described attached drawing is a part of the embodiment of the present invention, rather than is all implementedExample, those skilled in the art without creative efforts, can also be obtained according to these attached drawings other designsScheme and attached drawing.
Fig. 1 is the step flow chart of present invention audit auditing method;
Fig. 2 is the system block diagram of present invention audit audit system.
Specific implementation mode
The technique effect of the design of the present invention, concrete structure and generation is carried out below with reference to embodiment and attached drawing clearChu is fully described by, to be completely understood by the purpose of the present invention, feature and effect.Obviously, described embodiment is this hairBright a part of the embodiment, rather than whole embodiments, based on the embodiment of the present invention, those skilled in the art are not being paidThe other embodiment obtained under the premise of creative work, belongs to the scope of protection of the invention.In addition, be previously mentioned in textAll connection/connection relations not singly refer to component and directly connect, and refer to that can be added deduct according to specific implementation situation by addingFew couple auxiliary, to form more preferably coupling structure.Each technical characteristic in the invention, in not conflicting conflictUnder the premise of can be with combination of interactions.
Embodiment 1, the invention discloses a kind of audit auditing method based on Text Mining Technology, wherein including as followsStep:
S1, enterprise's contract dataset is extracted from enterprise's contract management system, and extract contract key message, carry out structuringStorage;Wherein, enterprise's contract dataset includes contract documents, and the document format of the contract documents is appointed for pdf, doc, docxIt is a kind of;The contract key message includes:Contract payment information, such as contract total price, down payment time, down payment goldVolume, second of payment time, second of Payment Amount.
S2, the contract key message of extraction and fund flow data are checked;
In conjunction with Fig. 1, the specific implementation process of above-mentioned steps is described in detail, content is as follows:
S11, technology reading contract documents are read using document;
S12, lay down a regulation library for contract key message, and extracts institute by text extraction techniques using the rule baseState contract key message;
S13, tables of data is established in the database, the contract key message that step S12 is extracted is stored in tables of data;
Step S13 realizes the structured storage of unstructured information by establishing tables of data.
S21, fund flow data is extracted from financial system;
In step S21, fund flow data is the financial system data returned from bank, including:Payment time, payment goldVolume, payer.
S22, by the fund flow data and the contract key message of step S12 extraction according to the audit regulation formulated in advance intoRow matching;
Such as:By cash flow data grabber payment time C1, Payment Amount C2, captures B companies by text techniques and closePayment time data D1, Payment Amount data D2 in compare payment time C1 and D1, Payment Amount according to audit regulationWhether activity between C2 and D2 main bodys meets audit regulation.
S23, underproof contract key message will be matched into line label grouping.
Step S23 is grouped by that will match underproof contract key message into line label, can be facilitated and be audited to same classProblem focuses on.
The contract key message includes:Contract payment information such as contract total price, the down payment time, is paid for the first timeThe money amount of money, second of payment time, second of Payment Amount, these information are believed with fixed format for the contract keyThe library that lays down a regulation is ceased, and the contract key message is extracted by text extraction techniques using the rule base;Such as it to extractThe data of " contract total price ", " down payment time ", formulating extracting rule is:Keyword " contract the total price "+amount of money (canonical tableIt is up to formula
((^[-]([1-9]\d*))|^0)(\.\d{1,2})$|(^[-]0\.\d{1,2}$));Keyword is " for the first timePayment time "+time format data (time of YYYY/MM/DD forms, regular expression be ^ d { 4 } (- |/|) d 1,2}\1\d{1,2}$).Rule base and text the crawl technology that the present embodiment is formulated are all made of pcre tools.
Wherein, the audit regulation of the audit regulation of the pre- formulation described in step S22, the pre- formulation can examining according to concernMeter problem (executing payment A1 as do not pressed contract terms, make payment beforehand A2, the inconsistent A3 of Payment Amount) defines audit issues rule,Form is B*:A* ... A*, such as B1:A1、B2:A2、B3:A3、B4:A2A3, wherein B* are rule numbers, and A* ... A* are expired by ruleThe condition of foot.
In conclusion the audit auditing method that the present embodiment is provided, is closed automatically from enterprise by Text Mining TechnologyWith middle extraction contract key message, structural data is formed, the fund flow data returned with bank in financial system is compared,By formulating audit issues rule, finding audit issues and being grouped, realizes and audit issues are sorted out, to be carried out to same problemsConcentrate audit.
Compared with previous enterprise's contract audit data method, the invention has the advantages that:One, enterprise's contract is automatically extractedIn contract key message save prodigious people's financial resources cost compared with previous artificial extraction;Two, by contract dataset withThe fund flow data that bank returns is checked automatically, it can be found that any unmatched problem, avoids missing because of human negligenceSome problems.
With reference to figure 2, the system comprises data acquisition module A10, contract key message abstraction module A20 and audits to checkModules A 30.Data acquisition module A10 acquisition is stored in the contract documents data of enterprise's contract management system, and before passing throughHold display module A4 displayings.Described information abstraction module A20 extracts contract key message from the contract documents, is from financeSystem extraction fund flow data, and be shown by front end display module A4, people can be made timely by front end display module A4Know whether the contract key message of extraction is correct,
The audit checks that modules A 30 includes that audit regulation formulates modules A 31, matching module A32.The audit regulationAudit regulation can be formulated according to the audit issues of concern by formulating modules A 31.The matching module A32 is used for the contract to extractionKey message, fund flow data are matched, and find audit issues by the audit regulation of formulation, and show mould by front endBlock A4 is visualized.
The better embodiment of the present invention is illustrated above, but the invention is not limited to the implementationExample, those skilled in the art can also make various equivalent modifications or be replaced under the premise of without prejudice to spirit of that inventionIt changes, these equivalent modifications or replacement are all contained in the application claim limited range.