Brand knowledge property right protection platform based on commodity network geneTechnical field
The present invention relates to a kind of right-safeguarding network platform, relate in particular to a kind of brand knowledge property right protection platform based on commodity network gene.
Background technology
The state that the industry of cracking down on counterfeit goods at present relatively lags behind in technology substantially, when screening sell-fake-products commodity, adopts manual type to check or consumer's report substantially, again by the mode of law right-safeguarding, safeguards that the brand knowledge property right of self is not encroached on afterwards.This method cost is very high, and effect is general.Along with the development of ecommerce, the singularity of e-commerce platform, makes sell-fake-products businessman more hidden, hits infringement more difficult.Enterprise self is due to the finiteness of technical merit and fund, be difficult to utilize active data analysis from mass data, to obtain sell-fake-products information, although the annual huge expense dropping into of each enterprise, increase manpower and materials, be difficult to take precautions against growing fake products infringement, the crack down on the fake and safeguard the rights limitation of method of tradition highlights.
Summary of the invention
Defect and deficiency that the present invention exists in order to solve above-mentioned prior art; provide a kind of and can use manpower and materials still less; effectively process more massive market; reduce the cost that Intellectual Property Right of Enterprises is safeguarded, thus the brand knowledge property right protection platform based on commodity network gene that brings economic benefit to promote.
Technical scheme of the present invention: a kind of brand knowledge property right protection platform based on commodity network gene; comprise data source module, data collection module, Data Integration module, data memory module, data analysis module, module of target detection, visualization model and market demand module
Data collection module, while collecting data source data, utilizes the distributed the whole network commodity data of the Hadoop platform construction grasping system of increasing income;
Data Integration module, by SKU storehouse and the SKU feature database of system made, carries out to the commodity of separate sources the data of collecting from data source uniqueness identification, and unstructured data is carried out to structuring arrangement and cleaning;
Data memory module, enters data warehouse by the data storage of having integrated, for data analysis provides support;
Data analysis module, to a large amount of non-structured comment on commodity data, carries out structurized arrangement;
Module of target detection, by the infringement commodity model of cognition of setting up, analyzing and testing is to doubtful infringement commodity;
Visualization model, the doubtful infringement commodity that analyzing and testing is arrived, represent to client by visualization interface.
The distributed the whole network commodity data of the Hadoop platform construction grasping system of increasing income that the present invention adopts has following feature: 1) high-performance high stability.System has realized the distributed crawl of multithreading, before independent crawl process, is independent of each other, and after certain captures mission failure, can realize automatic Restoration Mechanism, realizes more than 99.99% reptile job stability, and can be according to the rapidly horizontal reptile scale of business demand.2) dispatching algorithm of grasping system, according to client's significance level and last time monitoring time, reasonable arrangement reptile work weight, realizes the rapid reaction to new client and Very Important Person.3) grasping system context environmental memory, realizes the context environmental register system that separate sources commodity page corresponding data was crawled last time, guarantees that grasping system realizes the Data Update of increment.
In Data Integration module of the present invention, metadata definition is the most important preposition step of data cleansing.SKU (Simple Keep Unit) is the minimum form of expression that commodity in flow process are sold in ecommerce, but on internet during merchandise sales title various, commodity code is different, realizes knowledge of goods property right protection and will realize the definition of right-safeguarding commodity SKU metadata.The present invention need to define SKU form and the recognition feature (seeing the SKU of Fig. 3 commodity data unit definition) of the commodity metadata of own platform according to cracking down on counterfeit goods, utilize each platform open interface and own data acquisition system (DAS) by the corresponding various structurings, the semi-structured and unstructured data that are scattered on each large electric business's platform and social media platform, unified integration is in the commodity storehouse in own data platform, for the further excavation of the data of commodity provides basis.
Preferably, described data collection module is collected the data from each independent channel, and those data comprise that enterprise has data by oneself, all the Related product data that can collect on the own platform of Ji enterprise; The data of Related product on electricity business platform; The data of Related product on microblogging platform; And extensive stock related data in other relevant forums.
Preferably, described data analysis module, first by natural language processing technique, extracts product feature and User Perspective keyword; Then set up Chinese polarity judgement dictionary, define the polarity of the expressed viewpoint of different keywords, finally by keyword polarity, judge, comment on commodity is converted into computable data layout.
Preferably, characteristic key words is extracted main passing through comment text pre-service, based on high frequency words statistics, the dependence of low-frequency word syntax and artificial mode, adds, and extracts comment on commodity feature, substantially realizes the mainly covering of comment feature in comment on commodity information.
Preferably, pay close attention to and analyze and study existing product features abstracting method, further improve the product features word abstracting method based on statistics and pattern match.
Preferably, study and improved based on maximum entropy, the impact viewpoint word extracted based on SVM, analytical approach based on multiple sentence dependences such as decision trees, further improving the extraction accuracy rate to product features word, User Perspective.
Preferably, the foundation of Chinese polarity judgement dictionary further builds the Chinese polarity judgement dictionary based on HowNet, the semantic polarity dictionary of extended network and add synonym dictionary to carry out polarity judgement and analysis to synonym simultaneously, the program judgement of increase to polar intensity, improves user is evaluated to Semi-polarity synonym polarity judging nicety rate.
Preferably, by Chinese polarity judgement dictionary, by user's comment viewpoint be structured as can computing data layout.
Preferably, module of target detection utilizes data that commodity certified products sales page that client producer provides excavates as training set, extract certified products commodity price and the polarity viewpoint proper vector of user to product features, by commodity authenticity verification model, other sell the data of page the similar commodity that the whole network is located by unique SKU, carry out authenticity verification, obtain the probability that these commodity are certified products.
Preferably, market demand module is for the commodity that occur infringement, by the right-safeguarding service platform contact customer of docking, links up; By law, complaint, positive orientation guide means, directly for enterprise provides right-safeguarding service, effectively the intellecture property of maintaining enterprise is not encroached on.
The present invention by large market demand in intellectual property protection; utilize the comment and analysis system of independent research; by the large data analysis technique of exclusive semanteme; help brand manufacturers by the user comment of electric business's platform and feedback are carried out to data analysis; accurately locate infringement commodity and seller; and the follow-up a series of right-safeguarding solutions that provide by company, the intellecture property of maintaining enterprise is not encroached on.With respect to the tradition method of cracking down on counterfeit goods, the present invention can effectively process more massive market by manpower and materials still less, reduces the cost that Intellectual Property Right of Enterprises is safeguarded, thus the lifting that brings economic benefit.
Accompanying drawing explanation
Fig. 1 is Technology Roadmap of the present invention;
Fig. 2 is the schematic diagram of the distributed the whole network commodity data of Hadoop platform construction grasping system in the present invention;
Fig. 3 is commodity metadata SKU definition schematic diagram in the present invention;
Fig. 4 is data analysis schematic diagram in the present invention.
Embodiment
Below in conjunction with drawings and Examples, the present invention is further detailed explanation, but be not limiting the scope of the invention.
As shown in Figure 1, the present invention mainly comprises four parts:
1, data source module
Collection is from the data of each independent channel, and these data comprise that enterprise has data by oneself, all the Related product data that can collect on the own platform of Ji enterprise; The data of Related product on electricity business platform; The data of Related product on microblogging platform; And extensive stock related data in other relevant forums.
2, Data Integration
Data Integration partly comprises Data Collection, Data Integration, data storage three large modules.
(1) Data Collection.While collecting data source data, we utilize the distributed the whole network commodity data of the Hadoop platform construction grasping system (as shown in Figure 2) of increasing income, and native system has following feature: 1) high-performance high stability.System has realized the distributed crawl of multithreading, before independent crawl process, is independent of each other, and after certain captures mission failure, can realize automatic Restoration Mechanism, realizes more than 99.99% reptile job stability, and can be according to the rapidly horizontal reptile scale of business demand.2) dispatching algorithm of grasping system, according to client's significance level and last time monitoring time, reasonable arrangement reptile work weight, realizes the rapid reaction to new client and Very Important Person.3) grasping system context environmental memory, realizes the context environmental register system that separate sources commodity page corresponding data was crawled last time, guarantees that grasping system realizes the Data Update of increment.
(2) Data Integration.The data of collecting from data source, by SKU storehouse and the SKU feature database (as shown in Figure 3) of system made, are carried out to uniqueness identification to the commodity of separate sources, and unstructured data is carried out to structuring arrangement and cleaning.Metadata definition is the most important preposition step of data cleansing.SKU (Simple Keep Unit) is the minimum form of expression that commodity in flow process are sold in ecommerce, but on internet during merchandise sales title various, commodity code is different, realizes knowledge of goods property right protection and will realize the definition of right-safeguarding commodity SKU metadata.The present invention need to define SKU form and the recognition feature (seeing the SKU of Fig. 3 commodity data unit definition) of the commodity metadata of own platform according to cracking down on counterfeit goods, utilize each platform open interface and own data acquisition system (DAS) by the corresponding various structurings, the semi-structured and unstructured data that are scattered on each large electric business's platform and social media platform, unified integration is in the commodity storehouse in own data platform, for the further excavation of the data of commodity provides basis.
(3) data storage.Final data memory module enters data warehouse by the data storage of having integrated, for data analysis provides support.
3, data analysis
Our department comprises data analysis, target detection and visual three modules (as shown in Figure 4).
(1) data analysis.Data analysis module is mainly to a large amount of non-structured comment on commodity data, carries out structurized arrangement.First by natural language processing technique, extract product feature and User Perspective keyword; Then set up Chinese polarity judgement dictionary, define the polarity of the expressed viewpoint of different keywords.Finally by keyword polarity, judge, comment on commodity is converted into computable data layout.
1) Feature Words extracts main passing through comment text pre-service, based on high frequency words statistics, the dependence of low-frequency word syntax and artificial mode, adds, and extracts comment on commodity feature, substantially realizes the mainly covering of comment feature in comment on commodity information.In natural language processing, opining mining is one of gordian technique of this module.Research in this respect, we mainly pay close attention to and analyze and study existing product features abstracting method, further improve the product features word abstracting method based on statistics and pattern match.Research and having improved based on maximum entropy, the impact viewpoint word extracted based on SVM, analytical approach based on multiple sentence dependences such as decision trees, further improves the extraction accuracy rate to product features word, User Perspective.
2) Chinese polarity judgement dictionary.Further build the Chinese polarity judgement dictionary based on HowNet, the semantic polarity dictionary of extended network and add synonym dictionary to carry out polarity judgement and analysis to synonym simultaneously, the program judgement of increase to polar intensity, improves user is evaluated to Semi-polarity synonym polarity judging nicety rate.
3) comment viewpoint structuring.According to polarity judgement dictionary, by user's comment viewpoint be structured as can computing data layout.
(2) module of target detection is by the infringement commodity model of cognition of setting up, and analyzing and testing is to doubtful infringement commodity.Utilize data that commodity certified products sales page that client producer provides excavates as training set, extract certified products commodity price and the polarity viewpoint proper vector of user to product features, by commodity authenticity verification model, other sell the data of page the similar commodity that the whole network is located by unique SKU, carry out authenticity verification, obtain the probability that these commodity are certified products.
(3) the doubtful infringement commodity that visualization model arrives analyzing and testing, represent to client by visualization interface.
4, market demand
For the commodity that occur infringement, we link up by the right-safeguarding service platform contact customer of docking.By means such as law, complaint, positive orientation guides, directly for enterprise provides right-safeguarding service, effectively the intellecture property of maintaining enterprise is not encroached on.
The main Electronic Commerce platform of the present invention, utilize the semantic large data analysis system of Chinese leading in the world, accurately gather, analyze all comment and analysis data of Ge great electricity Shang platform on-line shop, reject invalid comment, help enterprise to excavate all kinds of commodity that relate to infringement in E-commerce market, and pass through relevant law, and the intellectual property protection dependency rule of e-commerce platform, eliminate sell-fake-products commodity and businessman, effectively reduce the cost of cracking down on counterfeit goods of enterprise, and the intellecture property of maintaining enterprise is not encroached on, industry accumulation and technical experience by us are set up visual intellectual property protection platform, for a reliable channel is set up in brand business intellectual property protection and consumer's right-safeguarding.Our company is success and domestic seven wolves, Yi Erkang, unified, big and small 15 brand to create cooperative relationship such as Jeanwest, wherein help seven wolves to eliminate fake products total sales volume 2,550 ten thousand, Yi Erkang 2,412 ten thousand, its favorite your health is used before this product every year, within 2013, drop into the network expense 4,000,000 of cracking down on counterfeit goods, produce little effect.After cooperating with our company, utilize the present invention to pass through the semantic large data analysis system of Chinese, accurately gather, analyze all comment and analysis data of Ge great electricity Shang platform on-line shop, reject invalid comment, automatically excavate all kinds of commodity that relate to infringement in E-commerce market, greatly reduced the input of manpower and materials, saved the cost of cracking down on counterfeit goods nearly 90%, the control effect of fake products has been improved to 30 times simultaneously.