CN113222471B

Movatterモバイル変換

Info

Publication number: CN113222471B
Application number: CN202110623218.2A
Authority: CN
Inventors: 苏秦; 孙佰清; 房岳; 鲍鑫; 王璧; 刘莹
Original assignee: Xian Jiaotong University; Harbin Institute of Technology Shenzhen
Current assignee: Xian Jiaotong University; Harbin Institute of Technology Shenzhen
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2023-06-06
Anticipated expiration: 2041-06-04
Also published as: CN113222471A

Abstract

The invention relates to enterprise market value risk monitoring, in particular to an asset wind control method and equipment based on new media data, and the asset wind control method based on the new media data comprises the following steps: obtaining financial data, new media public opinion data and transaction data of an enterprise from a server; step two: preprocessing financial data and transaction data, summarizing new media public opinion data according to sources and event main bodies, and preprocessing the public opinion data; step three: and inputting financial data, transaction data and corresponding data in a new media flow matrix corresponding to the enterprise in monitoring, and predicting whether market value fluctuation exceeds a safety range in a future fixed time period through a trained model. And if the fluctuation is higher than the safety range, sending out an early warning signal.

Description

Translated fromChinese

一种基于新媒体数据的资产风控方法及设备Asset risk control method and device based on new media data

技术领域technical field

本发明涉及企业市值风险监控，更具体的说是一种基于新媒体数据的资产风控方法及设备。The present invention relates to risk monitoring of enterprise market value, more specifically, an asset risk control method and equipment based on new media data.

背景技术Background technique

新媒体主要是指利用数字技术、网络技术,通过互联网、宽带局域网、无线通信网、卫星等道,以及电脑、手机、数字电视机等终端,向用户提供信息和娱乐服务的传播形态。由于新媒体与公众举例更近，且内容筛选更聚焦大众需求，因而，新媒体中的企业舆情在传播效率和影响能力都较传统媒体时代有显著提升。近年来，新媒体中的企业舆情监控对企业公共关系部门的重要性不断提高，为企业市场营销、品牌塑造、危机应对和市值管理等均具有重要意义。对舆情内容的有效识别能够显著提升上市公司对自身口碑的认识，识别企业发展中的风险要素，了解利益相关者诉求，最终实现对企业市值的风险管控需求。New media mainly refers to the form of communication that uses digital technology and network technology to provide users with information and entertainment services through the Internet, broadband local area network, wireless communication network, satellite and other channels, as well as terminals such as computers, mobile phones, and digital televisions. Since the new media is closer to the public, and the content screening is more focused on the needs of the public, the communication efficiency and influence of corporate public opinion in the new media are significantly improved compared with the traditional media era. In recent years, the importance of corporate public opinion monitoring in new media to corporate public relations departments has been increasing, and it is of great significance for corporate marketing, brand building, crisis response, and market value management. Effective identification of public opinion content can significantly improve the listed company's understanding of its own reputation, identify risk factors in the development of the enterprise, understand the demands of stakeholders, and finally realize the risk management and control needs of the company's market value.

尽管，已有的市值监测能够通过企业市值数据对市值风险进行监测，但其使用的数据维度十分有限，对意外事件的预测能力不足以。Although the existing market capitalization monitoring can monitor market capitalization risk through enterprise market capitalization data, the data dimension it uses is very limited, and the ability to predict unexpected events is insufficient.

此外，舆情监测系统产品能够实现数据采集、内容存储和查询以及基于自然语言处理技术的基本分析，其提出的分析成果，包含：情绪水平、话题声量、话题生命等，但其成果由于缺乏基于金融学和传播学的理论支撑，难以对应企业市值管理的需求为企业管理者提供有针对性的建议。In addition, public opinion monitoring system products can realize data collection, content storage and query, and basic analysis based on natural language processing technology. The analysis results proposed include: emotional level, topic volume, topic vitality, etc. It is difficult to provide targeted suggestions for enterprise managers in response to the needs of enterprise market value management.

发明内容Contents of the invention

本发明的目的是提供一种基于新媒体数据的资产风控方法及设备，能够实现对企业新媒体舆情数据的采集、归纳以及潜在风险识别，从而为企业提供基于当前舆情结果下的市值波动预测，从而能更好、更有效地辅助企业防范市值风险。The purpose of the present invention is to provide an asset risk control method and equipment based on new media data, which can realize the collection, induction and potential risk identification of new media public opinion data of enterprises, so as to provide enterprises with market value fluctuation prediction based on current public opinion results , so as to better and more effectively assist enterprises in preventing market capitalization risks.

本发明的目的通过以下技术方案来实现：The purpose of the present invention is achieved through the following technical solutions:

一种基于新媒体数据的资产风控方法，该方法包括以下步骤：A risk control method for assets based on new media data, the method comprising the following steps:

步骤一：从服务器获得企业的财务数据、新媒体舆情数据以及交易数据。Step 1: Obtain the company's financial data, new media public opinion data, and transaction data from the server.

步骤二：按照企业风控需求设定日市值波动的安全范围，标注企业市值风险；若，市值波动在安全范围内设为1，反之则为0。对财务数据和交易数据进行预处理，对新媒体舆情数据按照来源和事件主体进行归纳，并对舆情数据进行预处理；进而，计算各类舆情内容的流量并依据情感词汇本体库计算内容对应的细粒度情感(如好、乐、哀、怒、惊、惧、悲)强度，整合各类型内容流量和情感强度构建企业新媒体流量矩阵；将标注后的企业市值风险水平、财务数据、交易数据及新媒体流量矩阵所对应的流量及情感数据整合成为资产风控数据集，其中包括训练集和测试集；将训练集输入深度神经网络进行训练。Step 2: Set the safety range of daily market value fluctuations according to the enterprise's risk control needs, and mark the risk of the enterprise's market value; if the market value fluctuation is within the safe range, set it to 1, otherwise, it is 0. Preprocess the financial data and transaction data, summarize the new media public opinion data according to the source and event subject, and preprocess the public opinion data; then, calculate the flow of various public opinion content and calculate the content corresponding to the content based on the emotional vocabulary ontology database Fine-grained emotional (such as good, happy, sad, angry, frightened, fearful, and sad) intensity, integrating various types of content traffic and emotional intensity to build a new media traffic matrix for enterprises; the marked enterprise market value risk level, financial data, transaction data The traffic and emotional data corresponding to the new media traffic matrix are integrated into an asset risk control data set, including a training set and a test set; the training set is input into a deep neural network for training.

步骤三：输入处于监控中企业对应的财务数据、交易数据及新媒体流量矩阵中对应的数据，通过训练好的模型预测其在未来固定时间段内市值波动是否超过安全范围。若波动高于安全范围，则发出预警信号。Step 3: Input the corresponding financial data, transaction data and new media traffic matrix of the company under monitoring, and use the trained model to predict whether its market value fluctuations in the future fixed time period exceed the safe range. If the fluctuation is higher than the safe range, an early warning signal will be issued.

步骤一中，在企业市值波动安全阈值的确定上，本专利的一个实施例采用短期股价波动和企业短期特质风险两个指标。以企业过去一年的两种市值风险测度为基础，取过去一年分布中的上25％分位数的短期股价波动和企业短期特质风险对应值作为企业市值波动安全范围的上限。In Step 1, in determining the safety threshold of enterprise market value fluctuations, an embodiment of this patent uses two indicators: short-term stock price fluctuations and short-term idiosyncratic risks of enterprises. Based on the two market value risk measurements of the company in the past year, take the short-term stock price fluctuations of the upper 25% quantile in the past year's distribution and the corresponding value of the company's short-term idiosyncratic risk as the upper limit of the safe range of market value fluctuations.

步骤二中，对财务数据和交易数据进行的预处理操作包括：In step 2, the preprocessing operations on financial data and transaction data include:

对企业的财务数据按照企业财务季度完成更新入库，其中包含企业的操控性应计项目数量、总资产收益率、公司资产规模、雇员数量、产权性质、流动负债等关键指标。The financial data of the enterprise is updated and put into the database according to the financial quarter of the enterprise, which includes key indicators such as the number of manipulative accruals of the enterprise, the return on total assets, the scale of the company's assets, the number of employees, the nature of property rights, and current liabilities.

对企业的交易数据分别按天完成更新入库，其中包括企业市值、换手率、日收益率等关键指标。The transaction data of enterprises are updated and stored on a daily basis, including key indicators such as enterprise market value, turnover rate, and daily rate of return.

按天对数据进行重整，对缺失数据采用滑动平均法进行填充，按照财务数据特征和交易数据特征构建多维特征向量

其中f对应财务数据指标，对应N₁个维度；t对应交易数据指标，对应N₂个维度。Reorganize the data by day, fill in the missing data with the moving average method, and construct a multi-dimensional feature vector according to the characteristics of financial data and transaction data

Among them, f corresponds to financial data indicators and corresponds to N₁ dimensions; t corresponds to transaction data indicators and corresponds to N₂ dimensions.

步骤二中，对应舆情数据将数据来源归纳为以下五类：官媒、主流商业媒体、有影响力的财经自媒体、高影响力非财经自媒体以及普通自媒体；In step 2, corresponding to the public opinion data, the data sources are classified into the following five categories: official media, mainstream commercial media, influential financial self-media, high-impact non-financial self-media, and ordinary self-media;

自媒体影响力得分按照粉丝数、内容更新频率、内容平均阅读量等指标进行打分取得；The self-media influence score is obtained by scoring according to the number of fans, content update frequency, average content reading volume and other indicators;

官媒对应以《中国证券报》、《证券日报》、《证券时报》、《上海证券报》为代表的由官方控制的媒体，其新媒体数据对应上述机构及其旗下机构运营的媒体账号。The official media corresponds to the officially controlled media represented by "China Securities Journal", "Securities Daily", "Securities Times" and "Shanghai Securities News", and its new media data corresponds to the media accounts operated by the above-mentioned institutions and their subsidiaries.

主流商业媒体对应《中国经营报》、《第一财经日报》、《经济观察报》、《21世纪经济报道》等以市场导向型的媒体，其新媒体数据对应上述机构及其旗下机构运营的媒体账号。Mainstream commercial media correspond to market-oriented media such as "China Business Daily", "First Financial Daily", "Economic Observer", "21st Century Business Herald", and their new media data correspond to those operated by the above-mentioned institutions and their subsidiaries. media account.

其他媒体账号对应自媒体账号，其中包含企业自营的媒体账号。Other media accounts correspond to self-media accounts, including media accounts self-operated by enterprises.

自媒体影响力的计算步骤包括：The calculation steps of self-media influence include:

(1)根据自媒体名称对其在各新媒体平台的粉丝数量逐日进行更新。(1) According to the self-media name, the number of fans on each new media platform is updated daily.

(2)结合同名自媒体在各新媒体内的账号群体计算其每日对应的各平台平均更新频率和最近一周更新内容的平均阅读量。(2) Calculate the daily average update frequency of each platform and the average reading volume of updated content in the last week based on the account groups of the self-media with the same name in each new media.

(3)对更新内容按照体裁即文本、音频和视频三类进行归纳，计算其每日更新的数量，构建影响力的特征向量如下：(3) Summarize the updated content according to the genre, namely text, audio and video, calculate the number of daily updates, and construct the feature vector of influence as follows:

media_i,t＝＜fans_i,t,frequency_i,t,ave_volume_i,t,text_frequency_i,t,audio_frequency_i,t,video_frequency_i,t＞其中i对应企业，t对应时间更新频率。media_i,t ＝<fans_i,t ,frequency_i,t ,ave_volume_i,t ,text_frequency_i,t ,audio_frequency_i,t ,video_frequency_i,t ＞where i corresponds to the enterprise, and t corresponds to the time update frequency.

(4)基于上述指标，采用DBSCAN聚类算法，按照特征分布的密度，将自媒体划分为2类，包含较高粉丝数量、更新频率较高且对应高阅读量的账号对应高影响力类别，反之则为低影响力类别。(4) Based on the above indicators, using the DBSCAN clustering algorithm, according to the density of the feature distribution, the self-media is divided into two categories, including accounts with a high number of fans, high update frequency, and high reading volume corresponding to the high-influence category, Otherwise, it is a low-impact category.

自媒体是否属于高影响力的财经自媒体的计算步骤包括：The steps to calculate whether a self-media is a highly influential financial self-media include:

(1)对归属于高影响力类别的自媒体最近一个月的内容进行归纳，去除停用词并进行分词，本文实施例中使用jieba分词程序包对应的分析结果。(1) Summarize the content of the self-media belonging to the high-impact category in the last month, remove stop words and perform word segmentation. In this embodiment, the analysis results corresponding to the jieba word segmentation package are used.

(2)以各企业年报披露中出现的各类企业业绩与财务指标构建关键词词库，将自媒体内容的分词结果与之进行对比。若重合的关键词数目超过自媒体内容分词结果关键词数量的20％，且重合的关键词总数超过自媒体内容分词总词频的10％，则定义该自媒体为财经类别的自媒体。(2) Construct a keyword thesaurus based on various corporate performance and financial indicators that appear in the disclosure of each corporate annual report, and compare the word segmentation results of the self-media content with it. If the number of overlapping keywords exceeds 20% of the number of keywords in the word segmentation results of the self-media content, and the total number of overlapping keywords exceeds 10% of the total word frequency of the word segmentation of the self-media content, then the self-media is defined as the self-media of the financial category.

(3)按照上述流程逐月对自媒体是否归属财经类别进行更新。(3) Update whether the self-media belongs to the financial category according to the above process on a monthly basis.

来源的更新包括如下计算步骤：The update of the source includes the following calculation steps:

对数据采集过程中出现的新的自媒体账号，首先计算其自媒体影响力，若不符合高影响力类别，则仅将其归纳为普通自媒体；若符合高影响力类别，则将进一步判定其是否属于财经自媒体，若符合判定条件，则定义为高影响力财经自媒体，反之则定义为高影响力非财经自媒体。For new self-media accounts that appear during the data collection process, first calculate their self-media influence. If they do not meet the high-influence category, they will only be classified as ordinary self-media accounts; if they meet the high-influence category, they will be further judged Whether it belongs to financial self-media, if it meets the judgment conditions, it is defined as high-impact financial self-media, otherwise it is defined as high-impact non-financial self-media.

步骤二中，对新媒体数据的预处理包括如下计算过程：In step 2, the preprocessing of new media data includes the following calculation process:

(1)对新媒体内容去除停用词，并进行分词。按照词频对分词结果进行排序。(1) Remove stop words and perform word segmentation for new media content. Sort word segmentation results according to word frequency.

(2)按照分词结果对归属于企业业绩和财务分析类别的关键词总数进行计算，定义其数量为L1。(2) Calculate the total number of keywords belonging to the category of enterprise performance and financial analysis according to the word segmentation results, and define the number as L1.

(3)按照分词结果对归属于高管及主要人员行动的关键词总数进行计算，定义其数量为L2。(3) Calculate the total number of keywords belonging to the actions of executives and key personnel according to the word segmentation results, and define the number as L2.

(4)按照分词结果对归属于企业营销业务的关键词总数进行计算，定义其数量为L3。(4) Calculate the total number of keywords belonging to the marketing business of the enterprise according to the word segmentation results, and define the number as L3.

若L1、L2、L3均等于0，则该新媒体内容不属于任何类别，剔除这一内容，不划分到新媒体流量矩阵进行进一步运算。If L1, L2, and L3 are all equal to 0, then the new media content does not belong to any category, and this content is excluded, and will not be divided into the new media traffic matrix for further calculation.

若L1高于L2与L3之和，则认定该内容属于企业业绩与财务分析类别；同理，若L2高于L1与L3之和，则认定该内容属于高管及主要人员行动类别；若L3高于L1与L2之和，则认定该内容属于企业营销业务类别。If L1 is higher than the sum of L2 and L3, it is determined that the content belongs to the category of corporate performance and financial analysis; similarly, if L2 is higher than the sum of L1 and L3, it is determined that the content belongs to the category of executives and key personnel actions; if L3 If it is higher than the sum of L1 and L2, it is determined that the content belongs to the category of enterprise marketing business.

上述预处理过程，需对新媒体内容划分所使用的词库进行构建，词库构建过程如下：In the above preprocessing process, it is necessary to construct the lexicon used for new media content division. The lexicon construction process is as follows:

(1)针对企业业绩和财务分析类别内的关键词，基于证监会等其他监管机构发布的企业披露信息对应的政策与规范要求等文件，由上市公司季报、年报文本中财务信息对应的关键词构成；(1) For the keywords in the category of corporate performance and financial analysis, based on the policies and normative requirements for corporate disclosure issued by the China Securities Regulatory Commission and other regulatory agencies, the keywords corresponding to the financial information in the quarterly and annual reports of listed companies constitute;

(2)针对高管及主要人员行动的关键词，对应上市公司披露文件，尤其是年报中公布的企业高层管理人员、董事会成员等企业主要人员；(2) Key words aimed at the actions of executives and key personnel, corresponding to the disclosure documents of listed companies, especially the key personnel of the company such as senior management personnel and board members announced in the annual report;

(3)针对企业营销业务的关键词，由上市公司季报、年报中为企业带来营业收入的关键产品和业务构成。(3) The keywords for the marketing business of the enterprise consist of key products and businesses that bring operating income to the enterprise in the quarterly and annual reports of listed companies.

上述预处理过程，还需对新媒体内容划分所使用的词库进行更新，更新过程如下：The above preprocessing process also needs to update the thesaurus used for new media content division. The update process is as follows:

(1)对新媒体内容中出现的新名词，若出自证监会等其他监管机构发布的企业披露信息对应的政策与规范要求等文件，则划归到企业业绩和财务分析类别。(1) For new terms that appear in new media content, if they come from documents such as policies and normative requirements for corporate disclosure issued by other regulatory agencies such as the China Securities Regulatory Commission, they will be classified under the category of corporate performance and financial analysis.

(2)对企业公布的主要人员变更，将新人名划归到高管及主要人员类别，同时替换掉原职务对应人。(2) For the change of main personnel announced by the enterprise, the name of the new person is classified into the category of executives and main personnel, and the corresponding person of the original position is replaced at the same time.

(3)对应企业营销业务的关键词，若新名词对应利益相关者对企业、主营业务和产品等的昵称、别称，如米哈游对应的“米忽悠”等，则直接扩充进企业营销业务类别对应的词库中。(3) The keywords corresponding to the marketing business of the enterprise. If the new term corresponds to the nickname or alias of stakeholders for the enterprise, main business and products, such as "Mihuyou" corresponding to Mihayou, it will be directly expanded into enterprise marketing In the thesaurus corresponding to the business category.

综上，新媒体流量矩阵的框架对应新媒体内容的5种来源及3种内容类别，进一步构建矩阵中涉及的流量和情感强度。To sum up, the framework of the new media traffic matrix corresponds to 5 sources and 3 content categories of new media content, and further constructs the traffic and emotional intensity involved in the matrix.

步骤二中，对新媒体数据的情感强度的计算对应如下过程：In step 2, the calculation of the emotional strength of the new media data corresponds to the following process:

(1)本文按照新媒体流量矩阵中的内容，基于大连理工大学依据本土化要求开发的Ekman情感本体词库，结合自然语言处理技术，将分词后的各类情感词汇进行匹配，计算每个矩阵内元素各情感类别对应的情感词汇频数e_i,j,k,其中e_i,j,k∈E，i对应自媒体内容来源，j对应自媒体内容类别，k对应情感类型(包含，好、乐、哀、怒、惊、惧、悲七种)。(1) According to the content in the new media traffic matrix, based on the Ekman emotional ontology lexicon developed by Dalian University of Technology according to localization requirements, combined with natural language processing technology, all kinds of emotional vocabulary after word segmentation are matched, and each matrix is calculated The frequency of emotional words e_{i, j, k} corresponding to each emotional category of the inner element, where e_{i, j, k} ∈ E, i corresponds to the source of self-media content, j corresponds to the category of self-media content, and k corresponds to the type of emotion (including, good, joy, sorrow, anger, surprise, fear, and grief).

(2)将每一个矩阵内元素的各情感类别对应的情感词汇频数占对应内容的情感词汇总数的比例，取得每个矩阵内要素对应的情感强度ed_i,j,k，其中：(2) Calculate the ratio of the frequency of emotional words corresponding to each emotional category of elements in each matrix to the total number of emotional words in the corresponding content, and obtain the emotional intensity ed_i,j,k corresponding to the elements in each matrix, where:

综上完成了对新媒体流量矩阵的构建。In summary, the construction of the new media flow matrix has been completed.

进一步，步骤2中还包含对神经网络模型的训练。企业财务数据、交易数据及新媒体流量矩阵所对应的流量及情感数据用多维特征向量

进行表示，采用神经网络模型中的长短期记忆模型(LSTM)。模型包含输入门input，输出门output,遗忘门forget和内部记忆单元memory。此外，损失函数被设定为对数损失函数，其表示如下：L(Y,P(Y|M))＝-log₂P(Y|M)其中，Y对应市值波动范围是否超过安全范围的标注，P(Y|M)对应模型预测的结果。Further, step 2 also includes the training of the neural network model. Multidimensional eigenvectors for traffic and emotional data corresponding to corporate financial data, transaction data, and new media traffic matrix

For representation, the long short-term memory model (LSTM) in the neural network model is used. The model includes input gate input, output gate output, forget gate forget and internal memory unit memory. In addition, the loss function is set as a logarithmic loss function, which is expressed as follows: L(Y,P(Y|M))=-log₂ P(Y|M) where Y corresponds to whether the market value fluctuation range exceeds the safe range Label, P(Y|M) corresponds to the result predicted by the model.

使用当前输入M_t和上一个状态传递下来的H_t-1，结合激活函数拼接、训练得到如下四种状态：Using the current input M_t and the H_t-1 passed from the previous state, combined with the activation function splicing and training, the following four states are obtained:

输入门

input gate

信息门控

information gating

忘记门控

forget gating

输出

output

若z^o达到与真实结果对应的损失函数误差最小时，选定此时的模型为训练好的LSTM网络，保存对应各类单元的权重矩阵。If z^o reaches the minimum error of the loss function corresponding to the real result, the model at this time is selected as the trained LSTM network, and the weight matrix corresponding to each type of unit is saved.

作为其中一种实施方式，所述根据舆情数据按照来源和事件主体进行归类，构建企业新媒体流量矩阵，并对矩阵内流量数据及情感倾向进行计算的步骤包括:As one of the implementations, the described public opinion data is classified according to source and event subject, and the new media flow matrix of the enterprise is constructed, and the steps of calculating flow data and emotional tendency in the matrix include:

依据系统自建的主体来源库对每日新媒体舆情数据的来源进行核实，并按照官媒、主流商业媒体、有影响力的财经自媒体、其他类型的高影响力自媒体以及普通自媒体五种类型进行划分，同时更新主体来源库中自媒体主体的信息。The source of daily new media public opinion data is verified according to the main source database built by the system, and the five sources of official media, mainstream commercial media, influential financial self-media, other types of high-impact self-media, and ordinary self-media are verified. Classify the types, and update the information of the self-media subject in the source library of the subject at the same time.

对新媒体数据进行去除停用词并进行分词，按照分词结果对舆情对应主要事件归纳至企业业绩与财务分析；高管及主要人员行动和企业营销业务三类，集合来源划分结果，完成新媒体流量矩阵的构建。同时，对分词中出现的新词汇重新整理，更新事件主体对应的词库。Remove stop words and perform word segmentation on new media data, and summarize the main events corresponding to public opinion into corporate performance and financial analysis according to word segmentation results; executives and key personnel actions and corporate marketing business, collect source division results, and complete new media Construction of traffic matrix. At the same time, rearrange the new vocabulary that appears in the participle, and update the thesaurus corresponding to the subject of the event.

对归化在矩阵中各个子集的舆情数据的流量进行计算，并利用自然语言处理方法确定各个自己的各类情感倾向占比，将上述数据按照从主体来源到事件主体，从流量到流量对应的各类情感强度的顺序进行入库。Calculate the flow of public opinion data that is naturalized in each subset of the matrix, and use natural language processing methods to determine the proportion of each person's various emotional tendencies, and use the above data to correspond from the source of the subject to the subject of the event, from the flow to the flow The order of various emotional intensities is stored in the database.

企业市值波动预测的模型参数计算为，利用所述每组企业样本数据中的财务数据、交易数据和新媒体流量矩阵对应的特征及所述市值波动是否在安全范围的数据标签，训练所述长短期记忆神经网络模型(LSTM)，获得所述长短期记忆神经网络模型(LSTM)学习多个所述数据特征与数据标签的特征关系，得到所述经过训练的长短期记忆神经网络模型(LSTM)。The model parameters of enterprise market value fluctuation prediction are calculated as, using the characteristics corresponding to the financial data, transaction data and new media flow matrix in each group of enterprise sample data and the data label of whether the market value fluctuation is in a safe range, to train the long The short-term memory neural network model (LSTM), obtains the long-short-term memory neural network model (LSTM) and learns the feature relationship between a plurality of the data features and data labels, and obtains the trained long-short-term memory neural network model (LSTM) .

一种基于新媒体数据的资产风控设备，包括处理模块及存储模块，还包括固化在存储模块中的市值波动预警装置。An asset risk control device based on new media data includes a processing module and a storage module, and also includes a market value fluctuation warning device solidified in the storage module.

市值波动预警装置包括用于获取训练数据集的获取单元和用于利用训练数据集训练神经网络模型的训练单元。The market value fluctuation early warning device includes an acquisition unit for acquiring a training data set and a training unit for using the training data set to train a neural network model.

市值波动预警装置还包括用于对测试样本使用经过训练神经网络模型进行测试的测试单元，用于根据测试结果中市值波动是否处于安全范围标签与市值波动是否真实处于安全范围标签差值的优化单元。The market value fluctuation warning device also includes a test unit for testing the test samples using a trained neural network model, and an optimization unit for labeling the difference between whether the market value fluctuation is in a safe range label and whether the market value fluctuation is actually in a safe range in the test results .

一种基于新媒体数据的资产风控可读存储介质，可读存储介质中存储有计算机程序，当所述计算机程序在计算机上运行时，计算机执行如基于新媒体数据的企业市值风险监控方法。A readable storage medium for asset risk control based on new media data. A computer program is stored in the readable storage medium. When the computer program is run on a computer, the computer executes a method for monitoring enterprise market capitalization risk based on new media data.

本发明一种基于新媒体数据的资产风控方法及设备的有益效果为：The beneficial effects of an asset risk control method and equipment based on new media data of the present invention are as follows:

本发明一种新媒体数据的资产风控方法，新媒体所具有的信息传递的便捷性、受众和话题的广泛都为企业形象管理、营销业务等战略带来机遇与挑战。进一步新媒体中的内容传递到利益相关者，亦会对企业产生市场压力，导致市值波动。因而，将新媒体内容与企业财务数据和市值数据进行整合，能够更为广泛地覆盖当前企业所面临的的主要外部风险来源；使用神经网络模型对市值波动是否将超出风险阈值的预测，能够有效辅助企业做出反馈；此外，本专利所设计的新媒体流量矩阵能够从源头了解企业在各类主题和创作者主体上的新媒体影响力，进而为企业后续的风险阻断工作提供针对性的分析策略。The present invention is an asset risk control method for new media data. The convenience of information transmission, the wide range of audiences and topics possessed by new media all bring opportunities and challenges to strategies such as corporate image management and marketing business. Further transmission of content in new media to stakeholders will also generate market pressure on companies, leading to fluctuations in market value. Therefore, the integration of new media content with corporate financial data and market value data can more widely cover the main sources of external risks faced by current companies; the use of neural network models to predict whether market value fluctuations will exceed the risk threshold can be effective. Assist enterprises to make feedback; in addition, the new media traffic matrix designed in this patent can understand the new media influence of enterprises on various topics and creators from the source, and then provide targeted solutions for the enterprise's subsequent risk blocking work. analysis strategy.

市值波动预警装置，可以应用于新媒体数据的资产风控方法设备中，用于执行基于新媒体数据的企业市值风险监控方法中的各步骤。The market value fluctuation early warning device can be applied to the asset risk control method equipment of new media data, and is used to execute each step in the enterprise market value risk monitoring method based on new media data.

一种新媒体数据的资产风控设备，可以获取企业在当前时刻对应的财务数据、市值数据和企业对应的各类新媒体数据，然后对当前时刻所对应天的市值波动范围是否超过安全范围进行预测，能够提高企业对市值风险预测的准确性和可靠性。An asset risk control device for new media data, which can obtain the financial data, market value data and various new media data corresponding to the company at the current moment, and then check whether the market value fluctuation range of the corresponding day at the current moment exceeds the safe range. Forecasting can improve the accuracy and reliability of the enterprise's market capitalization risk prediction.

附图说明Description of drawings

下面结合附图和具体实施方法对本发明做进一步详细的说明。The present invention will be described in further detail below in conjunction with the accompanying drawings and specific implementation methods.

图1是本发明的市值波动预警装示意图；Fig. 1 is a schematic diagram of a market value fluctuation early warning device of the present invention;

图2是本发明的新媒体数据的资产风控方法示意图一；Fig. 2 is a schematic diagram 1 of an asset risk control method for new media data of the present invention;

图3是本发明的新媒体数据的资产风控方法示意图二；Fig. 3 is a second schematic diagram of the asset risk control method for new media data of the present invention;

图4是本发明的新媒体数据的资产风控设备示意图。Fig. 4 is a schematic diagram of an asset risk control device for new media data according to the present invention.

图中：In the picture:

电子设备10；electronic equipment 10;

处理模块11；processing module 11;

储存模块12；storage module 12;

市值波动预警装置100；Market valuefluctuation warning device 100;

获取单元110；Acquisition unit 110;

训练单元120。training unit 120 .

具体实施方式Detailed ways

下面结合附图1至4对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with accompanying drawings 1 to 4 .

一种新媒体数据的资产风控设备，基于新媒体数据的企业市值风险监控方法可以应用在上述电子设备10中，由电子设备10执行或实现方法的各步骤。An asset risk control device for new media data. The method for monitoring enterprise market capitalization risk based on new media data can be applied to the above-mentionedelectronic device 10, and each step of the method is executed or implemented by theelectronic device 10.

结合图2和图3方法可以包括以下步骤：The method in conjunction with Fig. 2 and Fig. 3 may include the following steps:

步骤S210，获取训练数据集，所述训练数据集包括多组样本数据，每组样本数据包括与多个时间序列对应的企业财务数据、市值数据及新媒体数据，所述多组样本数据中的部分组样本数据的所述数据包括上述企业数据特征及历史数据中企业市值波动范围是否处于市值波动阈值安全范围所得到的数据标签即，企业市值波动范围处于市值波动阈值安全范围，对应数据标签1，反之则为0；Step S210, obtain a training data set, the training data set includes multiple sets of sample data, each set of sample data includes corporate financial data, market value data, and new media data corresponding to multiple time series, the multiple sets of sample data The data of some groups of sample data includes the above-mentioned enterprise data characteristics and the data label obtained by whether the market value fluctuation range of the enterprise in the historical data is within the safe range of the market value fluctuation threshold, that is, the enterprise’s market value fluctuation range is within the safe range of the market value fluctuation threshold, corresponding to data label 1 , otherwise it is 0;

步骤S220，利用所述训练数据集训练神经网络模型，得到经过训练的神经网络模型，用于预测当前时刻之后的目标时间范围对应的市值波动范围是否处于安全范围。Step S220, using the training data set to train the neural network model to obtain the trained neural network model, which is used to predict whether the market value fluctuation range corresponding to the target time range after the current moment is in a safe range.

在上述的实施方式中，训练数据集的样本数据中，包括了基于新媒体渠道所获取的企业舆情数据，如此，可以丰富样本数据的多样性和实时性，从而有利于提高训练后的神经网络模型所预测的市值波动范围是否处于市值波动阈值安全范围的准确性和可靠性，改善因样本数据仅依据市值数据而使得神经网络模型预测的准确性和可靠性低的问题。In the above-mentioned embodiment, the sample data of the training data set includes corporate public opinion data obtained based on new media channels, so that the diversity and real-time performance of the sample data can be enriched, which is conducive to improving the neural network after training. The accuracy and reliability of whether the market value fluctuation range predicted by the model is within the safety range of the market value fluctuation threshold can improve the problem of low accuracy and reliability of the neural network model prediction because the sample data is only based on the market value data.

下面将对方法的各步骤进行详细阐述，如下：Each step of the method will be described in detail below, as follows:

在步骤S210中，训练数据集为在对神经网络模型进行训练前准备的数据集。训练数据集可以存储在电子设备10中，或者，训练数据集存储在其他设备中，可以供电子设备10从其他设备获取。训练数据集所包括的样本组数的数量通常较大，可以根据实际情况进行设置。In step S210, the training data set is a data set prepared before training the neural network model. The training data set may be stored in theelectronic device 10, or the training data set may be stored in other devices and may be acquired by theelectronic device 10 from other devices. The number of sample groups included in the training data set is usually relatively large, and can be set according to actual conditions.

其中，在每组样本数据中，数据特征为采集网络在不同时间序列得到的企业市值数据和新媒体内容数据。数据标签可以为在不同于数据特征的时间点所对应的市值波动范围是否处于市值波动阈值安全范围，若处于这一范围，对应数据标签1，反之则为0。Among them, in each set of sample data, the data feature is the enterprise market value data and new media content data obtained by the collection network in different time series. The data label can be whether the market value fluctuation range corresponding to the time point different from the data characteristics is within the safe range of the market value fluctuation threshold. If it is in this range, the corresponding data label is 1, otherwise it is 0.

其中，基于企业对应的新媒体数据构建新媒体流量矩阵，对新媒体内容去除停用词，并进行分词。按照词频对分词结果进行排序。按照分词结果对归属于企业业绩和财务分析类别的关键词总数进行计算，定义其为L1。按照分词结果对归属于高管及主要人员行动的关键词总数进行计算，定义其为L2。按照分词结果对归属于企业营销业务的关键词总数进行计算，定义其为L3。若L1、L2、L3均等于0，则该新媒体内容不属于任何类别，不划分到新媒体流量矩阵进行进一步运算。若L1高于L2与L3之和，则认定该内容属于企业业绩与财务分析类别；同理，若L2高于L1与L3之和，则认定该内容属于高管及主要人员行动类别；若L3高于L1与L2之和，则认定该内容属于企业营销业务类别。对数据采集过程中出现的新的自媒体账号，首先计算其自媒体影响力，若不符合高影响力类别，则仅将其归纳为普通自媒体；若符合高影响力类别，则将进一步判定其是否属于财经自媒体，若符合判定条件，则定义为高影响力财经自媒体，反之则定义为高影响力非财经自媒体。对新媒体流量矩阵中每个元素所对应的文本数量和文本对应的情感倾向进行计算。其中情感倾向的计算方法如下：按照新媒体流量矩阵中的内容，顺序采用采用大连理工大学依据本土化要求开发的Ekman情感本体词库与分词后的各类情感词汇进行匹配，计算每个矩阵内元素各情感类别对应的情感词汇频数。进一步将每一个矩阵内元素的各情感类别对应的情感词汇频数占对应内容的情感词汇总数的比例，取得每个矩阵内要素对应的情绪倾向。Among them, the new media traffic matrix is constructed based on the new media data corresponding to the enterprise, stop words are removed from the new media content, and words are segmented. Sort word segmentation results according to word frequency. Calculate the total number of keywords belonging to the category of enterprise performance and financial analysis according to the word segmentation results, and define it as L1. Calculate the total number of keywords attributed to the actions of executives and key personnel according to the word segmentation results, and define it as L2. Calculate the total number of keywords belonging to the marketing business of the enterprise according to the word segmentation results, and define it as L3. If L1, L2, and L3 are all equal to 0, then the new media content does not belong to any category, and will not be divided into the new media traffic matrix for further calculation. If L1 is higher than the sum of L2 and L3, it is determined that the content belongs to the category of corporate performance and financial analysis; similarly, if L2 is higher than the sum of L1 and L3, it is determined that the content belongs to the category of executives and key personnel actions; if L3 If it is higher than the sum of L1 and L2, it is determined that the content belongs to the category of enterprise marketing business. For new self-media accounts that appear during the data collection process, first calculate their self-media influence. If they do not meet the high-influence category, they will only be classified as ordinary self-media accounts; if they meet the high-influence category, they will be further judged Whether it belongs to financial self-media, if it meets the judgment conditions, it is defined as high-impact financial self-media, otherwise it is defined as high-impact non-financial self-media. Calculate the number of texts corresponding to each element in the new media traffic matrix and the emotional tendency corresponding to the texts. The calculation method of emotional tendency is as follows: according to the content in the new media traffic matrix, the Ekman emotional ontology lexicon developed by Dalian University of Technology according to the localization requirements is used to match with various emotional vocabulary after word segmentation, and the content in each matrix is calculated. The frequency of emotion words corresponding to each emotion category of the element. Further, the ratio of the frequency of emotional words corresponding to each emotional category of the elements in each matrix to the total number of emotional words in the corresponding content is used to obtain the emotional tendency corresponding to the elements in each matrix.

在本实施例中，步骤S210可以包括子步骤S211至子步骤S212，如下：In this embodiment, step S210 may include substep S211 to substep S212, as follows:

子步骤S211，从以指定采集频率采集得到的企业财务、市值与新媒体内容数据集中，通过滑动窗口获取多组数据，每组数据包括采集的时间序列连续的多个值；Sub-step S211, from the corporate finance, market value and new media content data set collected at the specified collection frequency, multiple sets of data are obtained through a sliding window, and each set of data includes multiple consecutive values of the collected time series;

子步骤S212，针对每组所述企业财务、市值与新媒体内容数据，当存在市值波动范围高于设置的市值波动阈值安全范围时，对这类样本进行过采样。Sub-step S212, for each set of corporate finance, market value and new media content data, when there is a market value fluctuation range higher than the set market value fluctuation threshold safety range, oversampling such samples is performed.

在本实施例中，电子设备10可以以指定采集频率，定时采集企业的市值数据和新媒体内容数据，形成企业财务、市值与新媒体内容数据集。其中，被采集的企业数量可以根据实际情况进行确定，可以为一个或多个关联企业。In this embodiment, theelectronic device 10 can regularly collect market value data and new media content data of the enterprise at a specified collection frequency to form a data set of enterprise finance, market value and new media content. Wherein, the number of collected enterprises can be determined according to the actual situation, and can be one or more associated enterprises.

在企业财务、市值与新媒体内容数据集中，企业市值和新媒体内容数据与相应的时间序列对应，该时间序列可理解为采集到企业市值和新媒体内容数据的时间戳。然后通过滑动窗口从企业财务、市值和新媒体内容数据获取每组企业数据标签对应的历史数据。In the enterprise finance, market capitalization and new media content data set, the enterprise market value and new media content data correspond to the corresponding time series, which can be understood as the time stamp of the collected enterprise market value and new media content data. Then, the historical data corresponding to each group of enterprise data labels is obtained from the enterprise finance, market value and new media content data through the sliding window.

进一步考察训练数据的样例。例如，以30天作为滑窗，天为采集频率，预测时间范围亦设定为天，即以过去30天的企业财务、市值和新媒体内容数据以及第30天的市值波动范围是够处于市值波动阈值安全范围作为数据标签；一般而言，设置年为训练数据的时间范围，滑动时间窗，获得训练数据集。Take a closer look at the examples of the training data. For example, if 30 days is used as the sliding window, days are the collection frequency, and the forecast time range is also set to days, that is, the company’s financial, market capitalization and new media content data in the past 30 days and the fluctuation range of the market capitalization on the 30th day are enough to be within the market capitalization range. The safety range of the fluctuation threshold is used as the data label; in general, the year is set as the time range of the training data, and the sliding time window is used to obtain the training data set.

当然，在其他实时方式中，滑动窗口的长度、采集频率、预测时间范围和训练数据对应的时间范围可以根据实际情况进行设置，这里不作具体限定。Certainly, in other real-time manners, the length of the sliding window, the collection frequency, the time range of prediction and the time range corresponding to the training data can be set according to the actual situation, which is not specifically limited here.

放大在市值波动范围高于设置的市值波动阈值安全范围的样本作用，对其进行过采样。例如，将过采样的比例设置为1:2,即将需要预警的样本数量放大一倍，提升模型对这类样本的描述能力，力图放宽数据不平衡的限制，提升训练结果的有效性和可靠性。Amplify the effect of samples whose market value fluctuation range is higher than the set market value fluctuation threshold safety range, and oversample it. For example, setting the oversampling ratio to 1:2 means doubling the number of samples that need to be warned, improving the model's ability to describe such samples, trying to relax the restrictions on data imbalance, and improving the effectiveness and reliability of training results. .

在步骤S220中，在获取到训练数据集后，可以直接利用训练数据集中的每组样本数据，对神经网络模型进行训练。其中，神经网络模型可以是但不限于深度神经网络模型、人工神经网络模型。神经网络模型可以包括输入层、循环层及全连接层，用于对每组样本数据进行学习训练，从而可以得到经过训练的神经网络模型。In step S220, after the training data set is obtained, each set of sample data in the training data set can be used directly to train the neural network model. Wherein, the neural network model may be, but not limited to, a deep neural network model or an artificial neural network model. The neural network model may include an input layer, a recurrent layer, and a fully connected layer for learning and training each set of sample data, so that a trained neural network model can be obtained.

在步骤S220中，在获取到训练数据集后，可以直接利用训练数据集中的每组样本数据，对长短期记忆神经网络模型进行训练。长短期记忆神经网络模型作为循环神经网络模型的变体，可以包括输入层、循环层及全连接层，用于对每组样本数据进行学习训练，从而可以得到经过训练的长短期记忆神经网络模型。In step S220, after the training data set is obtained, each set of sample data in the training data set can be used directly to train the long-short-term memory neural network model. As a variant of the cyclic neural network model, the long-term short-term memory neural network model can include an input layer, a cyclic layer, and a fully connected layer for learning and training each set of sample data, so that a trained long-term short-term memory neural network model can be obtained .

在本实施例中，步骤S220可以包括：利用所述每组样本数据中的多个所述数据特征及所述数据标签，训练所述长短期记忆神经网络模型，使得所述长短期记忆神经网络模型学习多个所述数据特征与所述数据标签的特征关系，得到所述经过训练的长短期记忆神经网络模型。企业财务数据、交易数据及新媒体流量矩阵所对应的流量及情感数据用多维特征向量

进行表示，采用神经网络模型中的长短期记忆模型。模型包含输入门input，输出门output,遗忘门forget和内部记忆单元memory。此外，损失函数被设定为对数损失函数，其表示如下：L(Y,P(Y|M))＝-log₂P(Y|M)其中，Y对应市值波动范围是否超过安全范围的标注，P(Y|M)对应模型预测的结果；其中长短期记忆神经网络模型即LSTM；循环神经网络模型即RNN；In this embodiment, step S220 may include: using a plurality of the data features and the data labels in each set of sample data to train the long-short-term memory neural network model, so that the long-short-term memory neural network The model learns a plurality of feature relationships between the data features and the data labels to obtain the trained long-short-term memory neural network model. Multidimensional eigenvectors for traffic and emotional data corresponding to corporate financial data, transaction data, and new media traffic matrix

For representation, the long short-term memory model in the neural network model is adopted. The model includes input gate input, output gate output, forget gate forget and internal memory unit memory. In addition, the loss function is set as a logarithmic loss function, which is expressed as follows: L(Y,P(Y|M))=-log₂ P(Y|M) where Y corresponds to whether the market value fluctuation range exceeds the safe range Marking, P(Y|M) corresponds to the result of the model prediction; the long-term short-term memory neural network model is LSTM; the cyclic neural network model is RNN;

输入门

input gate

信息门控

information gating

忘记门控

forget gating

输出

output

可理解地，在训练神经网络模型时，在将每组样本数据中的多个数据特征及数据标签输入至神经网络模型后，神经网络模型中的输入层、循环层及全连接层便可以对多个数据特征及数据标签进行学习训练，从而得到每组中的多个数据特征与数据标签的特征关系，使得神经网络模型具有根据多个数据特征预测下一时间序列或其他时间点的市值波动范围是否处于安全范围的能力，如此，便可以得到经过训练的神经网络模型。Understandably, when training a neural network model, after inputting multiple data features and data labels in each set of sample data into the neural network model, the input layer, recurrent layer and fully connected layer in the neural network model can Multiple data features and data labels are used for learning and training, so as to obtain the feature relationship between multiple data features and data labels in each group, so that the neural network model can predict the market value fluctuation of the next time series or other time points based on multiple data features Whether the scope is in a safe range, so that a trained neural network model can be obtained.

作为一种可选的实施方式，在步骤S210之后，方法还可以包括对神经网络模型进行测试优化的步骤，例如，在步骤S210之后，方法还可以包括：As an optional implementation manner, after step S210, the method may also include a step of testing and optimizing the neural network model, for example, after step S210, the method may also include:

根据测试样本对所述经过训练的神经网络模型进行测试，得到测试结果，所述测试样本包括多个时间序列连续的测试数据特征及测试数据标签，所述测试结果包括与所述测试数据标签的时间序列对应的市值波动范围；According to the test sample, the trained neural network model is tested to obtain a test result, the test sample includes a plurality of time series continuous test data features and test data labels, and the test results include the relationship between the test data labels and the test data labels. The market value fluctuation range corresponding to the time series;

根据所述测试结果中的市值波动范围是否处于安全范围与所述测试数据标签的真实水平的差值，通过所述神经网络模型中的预设损失函数，对所述神经网络模型优化，得到用于预测市值波动范围的所述神经网络模型。According to the difference between whether the market value fluctuation range in the test result is in the safe range and the true level of the test data label, the neural network model is optimized through the preset loss function in the neural network model to obtain The neural network model used to predict the fluctuation range of market value.

可以通过调整损失函数、不同神经网络模型的实施方式，对经过训练的神经网络模型进行测试优化，有利于提高神经网络模型对流量预测的准确性和可靠性。By adjusting the loss function and the implementation of different neural network models, the trained neural network model can be tested and optimized, which is conducive to improving the accuracy and reliability of the neural network model for traffic prediction.

请参照图3，在得到经过训练的深度神经网络模型后，可以方法还可以包括利用该神经网络模型对网络的流量数据进行预测的步骤。例如，在步骤S220之后，方法还可以包括步骤S230及步骤S240，如下：Please refer to FIG. 3 , after obtaining the trained deep neural network model, the method may further include a step of using the neural network model to predict network traffic data. For example, after step S220, the method may also include step S230 and step S240, as follows:

步骤S230，获取所述当前时刻之前的预设时段内对应所需的企业财务、市值和新媒体内容数据；Step S230, obtaining corresponding required corporate finance, market value and new media content data within a preset period of time before the current moment;

步骤S240，将所述企业财务、市值和新媒体内容数据输入至经过训练的所述神经网络模型，由所述神经网络模型根据所述多个时间序列对应的企业财务、市值和新媒体内容数据预测得到所述当前时刻之后的所述目标时刻的市值波动范围是否处于安全范围。Step S240, inputting the enterprise finance, market value and new media content data into the trained neural network model, and the neural network model according to the enterprise finance, market value and new media content data corresponding to the multiple time series It is predicted whether the fluctuation range of the market value at the target time after the current time is in a safe range.

在本实施例中，当前时刻可理解为需要对未来的目标时刻进行市值波动范围预测的时刻。目标时刻为当前时刻之后的一个时刻或多个不同的时刻，可以根据实际情况进行设置。预设时段可以根据实际情况进行确定，可以为1小时、3小时、5小时等时长，这里对预设时段不作具体限定。目标时刻可以为当前时刻之后的下一时间序列，或为当前时刻之后的指定时长对应的时刻，可以根据实际情况进行确定。指定时长可以根据实际情况进行设置，这里不作具体限定。如此，有利于企业根据实际情况灵活设定目标时刻，以便于对目标时段进行预测。In this embodiment, the current moment can be understood as the moment when it is necessary to predict the fluctuation range of the market value for the target moment in the future. The target time is one time or multiple different times after the current time, which can be set according to the actual situation. The preset time period can be determined according to the actual situation, and can be 1 hour, 3 hours, 5 hours, etc., and the preset time period is not specifically limited here. The target time may be the next time sequence after the current time, or a time corresponding to a specified duration after the current time, which may be determined according to the actual situation. The specified duration can be set according to the actual situation, and is not specifically limited here. In this way, it is beneficial for the enterprise to flexibly set the target time according to the actual situation, so as to predict the target time period.

如图4所示，即市值波动预警装置100，可以应用于上述的电子设备10中，用于执行方法中的各步骤。市值波动预警装置100包括至少一个可以软件或固件Firmware的形式存储于存储模块12中或固化在电子设备10操作系统Operating System，OS中的软件功能模块。处理模块11用于执行存储模块12中存储的可执行模块，例如市值波动预警装置100所包括的软件功能模块及计算机程序等。As shown in FIG. 4 , the market valuefluctuation warning device 100 can be applied to the above-mentionedelectronic device 10 to execute each step in the method. The market valuefluctuation warning device 100 includes at least one software function module that can be stored in thestorage module 12 in the form of software or firmware or solidified in the Operating System, OS, of theelectronic device 10 . Theprocessing module 11 is used to execute executable modules stored in thestorage module 12 , such as software function modules and computer programs included in the market valuefluctuation warning device 100 .

市值波动预警装置100可以包括获取单元110及训练单元120，执行的操作内容可以如下：The market value fluctuationearly warning device 100 may include anacquisition unit 110 and atraining unit 120, and the executed operations may be as follows:

获取单元110，用于获取训练数据集，所述训练数据集包括多组样本数据，每组样本数据包括与多个时间序列对应的企业财务、市值及新媒体数据，所述多组样本数据中的部分组样本数据的所述数据包括数据特征及基于市值波动范围与安全范围对比后所得到的数据标签；Theacquisition unit 110 is used to acquire a training data set, the training data set includes multiple sets of sample data, each set of sample data includes corporate finance, market value and new media data corresponding to multiple time series, among the multiple sets of sample data The data of the partial group sample data includes data characteristics and data labels obtained after comparing the fluctuation range of market value with the safe range;

训练单元120，用于利用所述训练数据集训练神经网络模型，得到经过训练的神经网络模型，用于预测当前时刻之后的目标时刻的市值波动范围数据。Thetraining unit 120 is configured to use the training data set to train the neural network model to obtain the trained neural network model, which is used to predict the fluctuation range data of the market value at the target time after the current time.

可选地，获取单元110还可以用于：Optionally, the acquiringunit 110 may also be used for:

从以指定采集频率采集得到的企业财务、市值及新媒体数据集中，通过滑动窗口获取多组企业财务、市值及新媒体数据集，每组数据包括采集的时间序列连续的多个值；From the corporate finance, market capitalization and new media data sets collected at the specified collection frequency, multiple sets of corporate finance, market capitalization and new media data sets are obtained through sliding windows, and each set of data includes multiple consecutive values of the collected time series;

放大在市值波动范围高于设置的市值波动阈值安全范围的样本作用，对其进行过采样；Amplify the effect of samples whose market value fluctuation range is higher than the set market value fluctuation threshold safety range, and oversample them;

可选地，训练单元120还可以用于：利用所述每组样本数据中的多个所述数据特征及所述数据标签，训练所述神经网络模型，使得所述神经网络模型学习多个所述数据特征与所述数据标签的特征关系，得到所述经过训练的神经网络模型。Optionally, thetraining unit 120 may also be configured to: train the neural network model by using multiple data features and data labels in each set of sample data, so that the neural network model learns multiple The feature relationship between the data feature and the data label is obtained to obtain the trained neural network model.

可选地，市值波动预警装置100还可以包括测试单元及优化单元。测试单元用于根据测试样本对所述经过训练的神经网络模型进行测试，得到测试结果，所述测试样本与训练样本相似包括多个时间序列连续的测试数据特征及测试数据标签，所述测试结果包括与所述测试数据标签的时间序列对应的流量值。优化单元用于结合所训练的企业或企业群组的特质调整损失函数等。Optionally, the market valuefluctuation warning device 100 may also include a testing unit and an optimizing unit. The test unit is used to test the trained neural network model according to the test samples to obtain test results. The test samples are similar to the training samples and include a plurality of time series continuous test data features and test data labels. The test results Including traffic values corresponding to the time series of the test data tags. The optimization unit is used to adjust the loss function and the like in combination with the characteristics of the trained enterprise or enterprise group.

可选地，市值波动预警装置100还可以包括预测单元。获取单元110还可以用于获取所述当前时刻之前的预设时段内对应所需的企业财务、市值和新媒体内容数据。预测单元，用于将所述企业财务、市值及新媒体内容数据输入至经过训练的所述神经网络模型，由所述神经网络模型根据所述多个时间序列对应的企业财务、市值和新媒体内容数据预测得到所述当前时刻之后的所述目标时刻的市值波动范围是否处于安全范围。Optionally, the market valuefluctuation warning device 100 may also include a prediction unit. The obtainingunit 110 may also be used to obtain corresponding required corporate finance, market value and new media content data within a preset period before the current moment. A forecasting unit, configured to input the enterprise finance, market value and new media content data into the trained neural network model, and the neural network model will use the enterprise finance, market value and new media corresponding to the multiple time series Content data predicts whether the market value fluctuation range at the target time after the current time is in a safe range.

综上所述，本申请实施例提供一种新媒体数据的资产风控方法包括：获取训练数据集，训练数据集包括多组样本数据，每组样本数据包括与多个时间序列对应的企业财务、市值和新媒体数据，多组样本数据中的部分组样本预测数据包括数据特征及基于对市值波动范围与安全范围对比得到的数据标签；利用训练数据集训练神经网络模型，得到经过训练的神经网络模型，用于预测当前时刻之后的目标时刻的数据。在本方案中，训练数据集的样本数据中亦包括与多个时间序列对应的企业财务、市值和新媒体数据，多组样本数据中的部分组样本预测数据包括数据特征及基于对市值波动范围与安全范围对比得到的数据标签。如此，可以丰富样本数据的维度，从而有利于提高训练后的神经网络模型所预测的市值范围的准确性和可靠性，改善因样本数据采集单一而使得神经网络模型预测的准确性和可靠性低的问题；需要注意的是上述实施例之间可以相互拼接使用，也可以全部结合在一起进行使用。In summary, the embodiment of the present application provides an asset risk control method for new media data, including: obtaining a training data set, the training data set includes multiple sets of sample data, and each set of sample data includes corporate financial data corresponding to multiple time series. , market capitalization and new media data, part of the sample prediction data in multiple groups of sample data includes data characteristics and data labels based on the comparison between the fluctuation range of market value and the safety range; use the training data set to train the neural network model, and obtain the trained neural network model. The network model is used to predict the data of the target time after the current time. In this scheme, the sample data of the training data set also includes corporate finance, market value, and new media data corresponding to multiple time series, and the forecast data of some groups of samples in the multiple sets of sample data includes data characteristics and market value fluctuation range based on The data label obtained by comparing with the security range. In this way, the dimensionality of the sample data can be enriched, which is conducive to improving the accuracy and reliability of the market value range predicted by the trained neural network model, and improving the low accuracy and reliability of the neural network model prediction due to the single collection of sample data. It should be noted that the above embodiments can be spliced with each other, or all of them can be used together.

Claims

1. An asset wind control method based on new media data is characterized in that: the method comprises the following steps:

step one: obtaining financial data, new media public opinion data and transaction data of an enterprise from a server;

step two: preprocessing financial data and transaction data, summarizing new media public opinion data according to sources and event subjects, and preprocessing the public opinion data;

step three: inputting financial data, transaction data and corresponding data in a new media flow matrix corresponding to an enterprise in monitoring, predicting whether market value fluctuation exceeds a safety range in a future fixed time period through a trained model, and sending out an early warning signal if the fluctuation is higher than the safety range;

preprocessing public opinion data, calculating the flow of various public opinion contents, calculating the fine granularity emotion intensity corresponding to the contents according to an emotion vocabulary ontology library, integrating the flow and emotion intensity of various contents to construct a new enterprise media flow matrix, integrating the marked enterprise market risk level, financial data, transaction data and flow and emotion data corresponding to the new media flow matrix into an asset wind control data set, wherein the asset wind control data set comprises a training set and a testing set, and inputting the training set into a deep neural network for training;

calculating flow data and emotion tendencies related to the new media flow matrix by using an Ekman emotion model;

the step of calculating the flow data and emotion tendencies in the matrix comprises the following steps:

step one: verifying the source of daily new media public opinion data according to a main body source library built by the system, dividing the main body source library into five types according to official media, main stream commercial media, influential financial self-media, other types of high-influential self-media and common self-media, and updating the information of the self-media main body in the main body source library;

step two: removing stop words from the new media data, performing word segmentation, and inducing main events corresponding to public opinion according to word segmentation results to enterprise performance and financial analysis; three types of high management, main personnel actions and enterprise marketing business are collected, and the source division results are collected to complete the construction of a new media flow matrix; meanwhile, rearranging new vocabulary appearing in the word segmentation, and updating a vocabulary library corresponding to the event main body;

step three: calculating the flow of new media public opinion data which are classified into each subset in the matrix, determining the proportion of each emotion tendency of each user by using a natural language processing method, and warehousing the data according to the sequence from the main source to the event main body and from the flow to the proportion of each emotion corresponding to the flow.

2. An asset wind control method based on new media data as claimed in claim 1, wherein: the event body corresponds to three classes: enterprise performance and financial analysis; high-rise and primary personnel action; business marketing business.

3. An asset wind control method based on new media data as claimed in claim 1, wherein: the sources of the new media public opinion data correspond to five categories: official media, mainstream commercial media, influential financial self-media, high influential non-financial self-media, and plain self-media.

4. An asset wind control method based on new media data as claimed in claim 1, wherein: model parameter calculation of enterprise market fluctuation prediction corresponds to the following steps: and training a long-term memory neural network model by utilizing the characteristics corresponding to the financial data, the transaction data and the new media flow matrix in each group of enterprise sample data and the data label of whether the market value fluctuation is in a safety range, so as to obtain the characteristic relation of the data characteristics and the data labels learned by the long-term memory neural network model, and obtain the trained long-term memory neural network model.

5. An asset wind control device based on new media data, comprising a processing module (11) and a storage module (12), characterized in that: further comprising a market wave pre-warning device (100) solidified in the storage module (12), the market wave pre-warning device (100) being adapted to perform the new media data based asset wind control method of claim 1.

6. An asset wind control device based on new media data as defined in claim 5, wherein: the urban value fluctuation warning device (100) comprises an acquisition unit (110) for acquiring a training data set and a training unit (120) for training a neural network model by using the training data set.

7. An asset wind control device based on new media data as defined in claim 6, wherein: the system further comprises a test unit for testing the test sample by using a trained neural network model, and an optimization unit for optimizing whether the market value fluctuation is in the safety range label and whether the market value fluctuation is actually in the safety range label difference value according to whether the market value fluctuation is in the safety range label in the test result.