Movatterモバイル変換


[0]ホーム

URL:


CN116680112B - Memory state detection method, device, communication equipment and storage medium - Google Patents

Memory state detection method, device, communication equipment and storage medium
Download PDF

Info

Publication number
CN116680112B
CN116680112BCN202310935420.8ACN202310935420ACN116680112BCN 116680112 BCN116680112 BCN 116680112BCN 202310935420 ACN202310935420 ACN 202310935420ACN 116680112 BCN116680112 BCN 116680112B
Authority
CN
China
Prior art keywords
memory
data
health
error
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310935420.8A
Other languages
Chinese (zh)
Other versions
CN116680112A (en
Inventor
李盛新
李道童
贾帅帅
陈衍东
韩红瑞
艾山彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co LtdfiledCriticalSuzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310935420.8ApriorityCriticalpatent/CN116680112B/en
Publication of CN116680112ApublicationCriticalpatent/CN116680112A/en
Application grantedgrantedCritical
Publication of CN116680112BpublicationCriticalpatent/CN116680112B/en
Priority to PCT/CN2024/087888prioritypatent/WO2025025683A1/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请实施例提供了一种内存状态检测方法、装置、通信设备及存储介质,包括:通过获取内存数据;根据内存数据的类型将内存数据划分为第一内存数据和第二内存数据;根据预设内存健康度评估模型和第一内存数据确定初步内存健康度分数,以及,将第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,预设内存健康度评估模型是基于第一历史内存数据确定的,输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的;根据初步内存健康度分数和健康度影响因子确定内存状态。即本申请通过根据内存数据对内存的影响将内存数据划分为两类,通过健康度影响因子调节内存健康度分数,可以有效且准确的对内存健康情况进行检测。

Embodiments of the present application provide a memory status detection method, device, communication equipment and storage medium, including: obtaining memory data; dividing the memory data into first memory data and second memory data according to the type of the memory data; Assume that the memory health evaluation model and the first memory data determine the preliminary memory health score, and the second memory data is processed through the input and output model to determine the health influencing factors, wherein the preset memory health evaluation model is based on the first memory health evaluation model. The first historical memory data is determined, and the input and output model is generated by training the preset initial model based on the second historical memory data; the memory status is determined based on the preliminary memory health score and the health influence factor. That is to say, this application divides the memory data into two categories according to the impact of the memory data on the memory, and adjusts the memory health score through the health influence factor, so that the memory health condition can be effectively and accurately detected.

Description

Translated fromChinese
内存状态检测方法、装置、通信设备及存储介质Memory status detection method, device, communication equipment and storage medium

技术领域Technical Field

本申请涉及数据处理技术领域,特别是一种内存状态检测方法、装置、通信设备及存储介质。The present application relates to the field of data processing technology, and in particular to a memory status detection method, device, communication equipment and storage medium.

背景技术Background Art

在现代大型数据中心中,通常有数百万台服务器协同工作,用以提供高性能计算和大数据存储服务。由于这些服务器上运行着大量任务,硬件故障可能会对服务器的可靠性、可用性和可服务性(Reliability Availability Serviceability,RAS)造成巨大影响。在服务器系统中,内存也被称为内存储器,其作用是用于暂时存放CPU中的运算数据,以及与硬盘等外部存储器交换的数据。计算机中所有程序的运行都是在内存中进行的,因此内存异常对计算机的影响非常大。同时,内存故障也是对硬件最常见的威胁之一。为了防止内存出错,服务器通常会为内存配备先进的(Error Correction Code,ECC)机制,如SEC-DED和ChipKill。然而,仅仅依靠ECC来保证内存的可靠性是远远不够的。在现代数据中心中,内存故障已被证明是服务器宕机或系统故障的主要原因。随着计算密度和内存容量的不断增加,带来了更高的内存故障风险。In modern large data centers, millions of servers usually work together to provide high-performance computing and big data storage services. Since these servers run a large number of tasks, hardware failures may have a huge impact on the reliability, availability, and serviceability (RAS) of the servers. In server systems, memory is also called internal memory, which is used to temporarily store the calculation data in the CPU and the data exchanged with external storage such as hard disks. All programs in the computer run in memory, so memory anomalies have a great impact on the computer. At the same time, memory failure is also one of the most common threats to hardware. In order to prevent memory errors, servers usually equip the memory with advanced (Error Correction Code, ECC) mechanisms such as SEC-DED and ChipKill. However, relying solely on ECC to ensure the reliability of memory is far from enough. In modern data centers, memory failure has been proven to be the main cause of server downtime or system failure. With the continuous increase in computing density and memory capacity, there is a higher risk of memory failure.

相关技术中,相关技术人员依据检测到的内存可纠正错误(Correctable Error,CE)数量、出现频率、运行温度、插拔次数、功耗等因素,拟通过某种内存人工智能算法建立一个模型,计算得到内存健康度,例如,基于机器学习和深度学习的人工智能算法,然而上述方案中存在将内存相关信息均作为同一维度的输入至人工智能算法,因此,会导致模型训练计算量的增加、模型建立不稳定的问题,无法准确反馈当前内存状态。In the related technology, relevant technicians intend to establish a model through a certain memory artificial intelligence algorithm to calculate the memory health based on the number of detected memory correctable errors (CE), frequency of occurrence, operating temperature, number of plug-in and plug-out times, power consumption and other factors. For example, artificial intelligence algorithms based on machine learning and deep learning. However, the above scheme uses all memory-related information as input to the artificial intelligence algorithm in the same dimension. Therefore, it will lead to an increase in the amount of model training calculations and unstable model establishment, and it is impossible to accurately feedback the current memory status.

发明内容Summary of the invention

本申请实施例的目的在于提供一种内存状态检测方法、装置、通信设备及存储介质,以解决现有技术中无法准确反馈当前内存状态的技术问题。具体技术方案如下:The purpose of the embodiments of the present application is to provide a memory status detection method, device, communication device and storage medium to solve the technical problem that the current memory status cannot be accurately fed back in the prior art. The specific technical solution is as follows:

在本申请实施的第一方面,首先提供了一种内存状态检测方法,所述内存状态检测方法包括:In a first aspect of the implementation of the present application, a memory status detection method is first provided, and the memory status detection method comprises:

获取内存数据;Get memory data;

根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;Dividing the memory data into first memory data and second memory data according to the type of the memory data;

根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,所述预设内存健康度评估模型是基于第一历史内存数据确定的,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的;Determine a preliminary memory health score according to a preset memory health assessment model and the first memory data, and process the second memory data through an input-output model to determine a health influencing factor, wherein the preset memory health assessment model is determined based on the first historical memory data, and the input-output model is generated by training a preset initial model based on the second historical memory data;

根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态。The current memory state is determined according to the preliminary memory health score and the health influencing factor.

可选地,所述第二内存数据中包括实际输入数据和实际输出数据,所述实际输入数据和所述实际输出数据一一对应;所述将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子包括:Optionally, the second memory data includes actual input data and actual output data, and the actual input data and the actual output data correspond to each other one by one; and processing the second memory data through the input-output model to determine the health impact factor includes:

将实际输入数据输入至输入输出模型,得到所述实际输入数据对应的预测输出数据;Inputting actual input data into the input-output model to obtain predicted output data corresponding to the actual input data;

将所述预测输出数据和实际输出数据进行比对,得到误差值;Comparing the predicted output data with the actual output data to obtain an error value;

根据所述误差值确定健康度影响因子。A health impact factor is determined according to the error value.

可选地,所述实际输入数据包括以下至少一种平均电压、运行时平均频率以及平均擦写速度数据,所述实际输出数据包括内存平均温度数据,在所述将实际输入数据输入至输入输出模型,得到所述实际输入数据对应的预测输出数据的步骤之前,所述方法包括:Optionally, the actual input data includes at least one of the following average voltage, average frequency during operation, and average erase/write speed data, and the actual output data includes average memory temperature data. Before the step of inputting the actual input data into the input-output model to obtain predicted output data corresponding to the actual input data, the method includes:

将多个预设时间段内采集的实际输入数据进行预处理,生成多个预设时间段内对应的数据集;Preprocessing the actual input data collected within multiple preset time periods to generate data sets corresponding to the multiple preset time periods;

在所述多个预设时间段内对应的数据集中筛选当前时刻对应的三个预设时间段内所述内存状态处于正常状态的目标数据集;Filtering, from the data sets corresponding to the multiple preset time periods, a target data set whose memory status is in a normal state within three preset time periods corresponding to the current moment;

将所述目标数据集进行归一化处理,得到训练样本。The target data set is normalized to obtain training samples.

可选地,在所述将所述目标数据集进行归一化处理,得到训练样本的步骤之后,所述方法包括:Optionally, after the step of normalizing the target data set to obtain training samples, the method includes:

根据所述训练样本对预设初始模型进行训练,得到输入输出模型。The preset initial model is trained according to the training samples to obtain an input-output model.

可选地,所述根据所述误差值确定健康度影响因子包括:Optionally, determining the health impact factor according to the error value includes:

在检测到所述误差值小于目标误差阈值的情况下,将所述健康度影响因子设置为1;When it is detected that the error value is less than the target error threshold, the health impact factor is set to 1;

在检测到所述误差值大于所述目标误差阈值的情况下,获取预先设置的内存健康度策略,根据所述内存健康度策略确定所述健康度影响因子。When it is detected that the error value is greater than the target error threshold, a preset memory health policy is obtained, and the health impact factor is determined according to the memory health policy.

可选地,所述目标误差阈值是根据所述预测输出数据和所述实际输出数据之间的目标均方误差向量对应的均方根误差确定的。Optionally, the target error threshold is determined according to a root mean square error corresponding to a target mean square error vector between the predicted output data and the actual output data.

可选地,所述根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态包括:Optionally, determining the current memory state according to the preliminary memory health score and the health impact factor includes:

根据所述初步内存健康度分数、当前时刻的所述健康度影响因子以及所述当前时刻对应的上一个预设时间周期内的所述健康度影响因子确定当前时刻内存对应的目标内存健康度分数。A target memory health score corresponding to the memory at the current moment is determined according to the preliminary memory health score, the health impact factor at the current moment, and the health impact factor within a previous preset time period corresponding to the current moment.

可选地,所述目标内存健康度分数通过以下公式生成:Optionally, the target memory health score is generated by the following formula:

其中,上述公式中,用于表示所述目标内存健康度分数,用于表示所述初步内存健康度分数,用于表示当前时刻的所述健康度影响因子,用于表示所述当前时刻对应的上一个预设时间周期内的所述健康度影响因子,用于表示多个所述预测输出数据和实际输出数据之间的误差值对应的平均值,用于表示目标误差阈值。Among them, in the above formula, It is used to represent the target memory health score. is used to represent the preliminary memory health score, It is used to represent the health impact factor at the current moment, used to indicate the health impact factor within the previous preset time period corresponding to the current moment, It is used to represent the average value corresponding to the error values between the predicted output data and the actual output data, Used to indicate the target error threshold.

可选地,在所述根据所述初步内存健康度分数、当前时刻的所述健康度影响因子以及所述当前时刻对应的上一个预设时间周期内的所述健康度影响因子确定当前时刻内存对应的目标内存健康度分数的步骤之后,所述方法包括:Optionally, after the step of determining a target memory health score corresponding to the memory at the current moment according to the preliminary memory health score, the health impact factor at the current moment, and the health impact factor within a previous preset time period corresponding to the current moment, the method includes:

在检测到所述目标内存健康度分数小于目标健康度阈值,或者,所述健康度影响因子小于或者大于1的情况下,向用户发送预警信息,以使所述用户通过预设显示界面查看当前时刻的内存状态。When it is detected that the target memory health score is less than the target health threshold, or the health impact factor is less than or greater than 1, a warning message is sent to the user so that the user can view the current memory status through a preset display interface.

可选地,所述第一内存数据是通过寄存器日志获取的,所述第一内存数据包括以下至少一种:内存硬故障、内存错误数量、内存错误类型以及使能错误修复操作。Optionally, the first memory data is obtained through a register log, and the first memory data includes at least one of the following: memory hard faults, memory error numbers, memory error types, and enabled error repair operations.

可选地,所述预设内存健康度评估模型包括:Optionally, the preset memory health assessment model includes:

在检测到预设数量的所述内存硬故障的情况下,扣除第一预设分值;In the case where a preset number of memory hard failures are detected, deducting a first preset score;

在检测到预设时间内所述内存错误数量大于预设阈值的情况下,扣除第二预设分值;When it is detected that the number of memory errors within a preset time is greater than a preset threshold, a second preset score is deducted;

根据所述内存错误类型确定所述内存错误类型对应的第三预设分值;Determine, according to the memory error type, a third preset score corresponding to the memory error type;

根据所述使能错误修复操作确定所述使能错误修复操作对应的第四预设分值。A fourth preset score corresponding to the enabling error recovery operation is determined according to the enabling error recovery operation.

可选地,所述内存错误类型包括以下至少一种:内存硬错误、内存软错误、SRAO错误、UCNA错误、SRAR错误以及突发致命错误。Optionally, the memory error type includes at least one of the following: a memory hard error, a memory soft error, an SRAO error, a UCNA error, a SRAR error, and a sudden fatal error.

可选地,所述根据所述内存错误类型确定所述内存错误类型对应的第三预设分值包括:Optionally, determining, according to the memory error type, a third preset score corresponding to the memory error type includes:

在检测到所述内存错误类型为内存硬错误、内存软错误、SRAO错误、UCNA错误、SRAR错误其中一种的情况下,扣除所述第三预设分值;When it is detected that the memory error type is one of a memory hard error, a memory soft error, an SRAO error, a UCNA error, and an SRAR error, deducting the third preset score;

在检测到所述内存错误类型为突发致命错误的情况下,所述初步内存健康度分数为0。When it is detected that the memory error type is a sudden fatal error, the preliminary memory health score is 0.

可选地,所述使能错误修复操作包括以下至少一种:消耗PCLS、使能Bank级别ADDDC功能、使能Rank级别ADDDC功能以及使能风暴抑制功能。Optionally, the enabling error repair operation includes at least one of the following: consuming PCLS, enabling a Bank-level ADDDC function, enabling a Rank-level ADDDC function, and enabling a storm suppression function.

可选地,所述使能错误修复操作对应的分数排序从小到大为:消耗PCLS、使能Bank级别ADDDC功能、使能Rank级别ADDDC功能以及使能风暴抑制功能。Optionally, the scores corresponding to the enabling error repair operation are ranked from small to large as follows: consuming PCLS, enabling Bank-level ADDDC function, enabling Rank-level ADDDC function, and enabling storm suppression function.

可选地,所述输入输出模型对应的数据集包括四列向量数据,所述输入输出模型包括三个输入数据对应一个输出数据。Optionally, the data set corresponding to the input-output model includes four columns of vector data, and the input-output model includes three input data corresponding to one output data.

可选地,所述三个输入数据包括内存对应的平均电压、运行时平均频率以及平均擦写速度数据,所述输出数据包括内存对应的平均温度数据。Optionally, the three input data include average voltage, average frequency during operation and average erase/write speed data corresponding to the memory, and the output data includes average temperature data corresponding to the memory.

在本申请实施的又一方面,还提供了一种内存状态检测装置,所述装置包括:In another aspect of the present application, a memory status detection device is provided, the device comprising:

获取模块,用于获取内存数据;Acquisition module, used to obtain memory data;

划分模块,用于根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;A division module, used for dividing the memory data into first memory data and second memory data according to the type of the memory data;

第一确定模块,用于根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的;a first determination module, configured to determine a preliminary memory health score according to a preset memory health assessment model and the first memory data, and to process the second memory data through an input-output model to determine a health influencing factor, wherein the input-output model is generated by training a preset initial model based on the second historical memory data;

第二确定模块,用于根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态。The second determination module is used to determine the current memory state according to the preliminary memory health score and the health influencing factor.

在本申请实施的又一方面,还提供了一种通信设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;In another aspect of the present application, a communication device is provided, including a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;

存储器,用于存放计算机程序;Memory, used to store computer programs;

处理器,用于执行存储器上所存放的程序时,实现上述任一所述的内存状态检测方法。The processor is used to implement any of the above-mentioned memory status detection methods when executing a program stored in the memory.

在本申请实施的又一方面,还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述任一所述的内存状态检测方法。In another aspect of the implementation of the present application, a computer-readable storage medium is provided, wherein instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is executed on a computer, the computer executes any of the above-mentioned memory status detection methods.

在本申请实施的又一方面,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述任一所述的内存状态检测方法。In another aspect of the implementation of the present application, a computer program product comprising instructions is also provided, which, when executed on a computer, enables the computer to execute any of the above-mentioned memory status detection methods.

本申请实施例提供的内存状态检测方法,通过获取内存数据;根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,所述预设内存健康度评估模型是基于第一历史内存数据确定的,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的;根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态。即本申请实施例通过分析影响服务器内存健康运行的各种因素,将不同影响因素对应的内存数据分为两类,即第一内存数据和第二内存数据,其中,第一存储数据对应的第一历史存储数据可以生成预设内存健康度评估模型,第一存储数据可以根据预设内存健康度评估模型确定内存对应的初步内存健康度分数;其次,可以通过对第二历史内存数据进行处理,对预设初始模型进行训练,从而生成输入输出模型,通过将第二内存数据输入至训练好的输入输出模型后,可以确定内存的健康度影响因子,进而根据初步内存健康度分数和健康度影响因子可以确定当前内存状态,本申请通过根据内存数据对内存的影响将内存数据划分为两类,通过健康度影响因子调节内存健康度分数,可以有效且准确的对内存健康情况进行检测。The memory status detection method provided in the embodiment of the present application obtains memory data; divides the memory data into first memory data and second memory data according to the type of the memory data; determines a preliminary memory health score according to a preset memory health assessment model and the first memory data, and processes the second memory data through an input-output model to determine a health influencing factor, wherein the preset memory health assessment model is determined based on the first historical memory data, and the input-output model is generated by training a preset initial model based on the second historical memory data; and determines the current memory status according to the preliminary memory health score and the health influencing factor. That is, the embodiment of the present application analyzes various factors that affect the healthy operation of the server memory, and divides the memory data corresponding to different influencing factors into two categories, namely, first memory data and second memory data, wherein the first historical storage data corresponding to the first storage data can generate a preset memory health assessment model, and the first storage data can determine the preliminary memory health score corresponding to the memory according to the preset memory health assessment model; secondly, the second historical memory data can be processed to train the preset initial model to generate an input-output model, and by inputting the second memory data into the trained input-output model, the memory health influencing factor can be determined, and then the current memory state can be determined according to the preliminary memory health score and the health influencing factor. The present application divides the memory data into two categories according to the impact of the memory data on the memory, and adjusts the memory health score according to the health influencing factor, so as to effectively and accurately detect the memory health status.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art are briefly introduced below.

图1示出了本申请实施例提供的内存状态检测方法的步骤流程图一;FIG1 shows a flowchart of the steps of a memory status detection method provided by an embodiment of the present application;

图2示出了本申请实施例提供的内存状态检测方法的步骤流程图二;FIG2 shows a second flow chart of the steps of the memory status detection method provided in an embodiment of the present application;

图3示出了本申请实施例提供的内存状态检测方法的步骤流程图三;FIG3 shows a flowchart of the steps of the memory status detection method provided in an embodiment of the present application;

图4示出了本申请实施例提供的内存状态检测方法的步骤流程图四;FIG4 shows a fourth flow chart of the steps of the memory status detection method provided in an embodiment of the present application;

图5示出了本申请实施例提供的一种内存状态检测装置的装置框图;FIG5 shows a device block diagram of a memory status detection device provided by an embodiment of the present application;

图6示出了本申请实施例提供的一种通信设备的结构框图;FIG6 shows a structural block diagram of a communication device provided in an embodiment of the present application;

图7示出了本申请实施例提供的一种输入输出模型训练方法示意图;FIG7 shows a schematic diagram of an input-output model training method provided in an embodiment of the present application;

图8示出了本申请实施例提供的一种三输入单输出ANFIS结构示意图;FIG8 shows a schematic diagram of a three-input single-output ANFIS structure provided in an embodiment of the present application;

图9示出了本申请实施例提供的一种初始FIS网络结构生成方法示意图;FIG9 shows a schematic diagram of a method for generating an initial FIS network structure provided in an embodiment of the present application;

图10示出了本申请实施例提供的一种预设显示界面示意图;FIG10 shows a schematic diagram of a preset display interface provided in an embodiment of the present application;

图11示出了本申请实施例提供的另一种预设显示界面示意图。FIG. 11 shows a schematic diagram of another preset display interface provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施方式中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。To make the purpose, technical scheme and advantages of the embodiments of the present application clearer, each embodiment of the present application will be described in detail below in conjunction with the accompanying drawings. However, it will be appreciated by those skilled in the art that in each embodiment of the present application, many technical details are proposed in order to enable the reader to better understand the present application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical scheme claimed in the present application can also be implemented. The division of the following embodiments is for convenience of description, and the specific implementation of the present application should not constitute any limitation, and the various embodiments can be combined with each other and referenced to each other without contradiction.

参照图1,示出了本申请实施例提供的内存状态检测方法的步骤流程图一,所述方法可以包括:1 , a flowchart of the steps of a memory status detection method provided by an embodiment of the present application is shown. The method may include:

步骤101,获取内存数据;Step 101, obtaining memory data;

需要说明的是,在本申请实施例中,将内存错误的根本原因称为故障,将硬件中的不可逆损坏称为硬故障,将观察到的故障症状称为错误。由硬故障导致的错误称为硬错误,硬故障在本质上是持久且重复发生的,不能随时间的推移或者系统复位、重启来解决。相对应的,软故障和软错误是随机瞬时的,例如由粒子碰撞引起的比特翻转。上述内存硬故障数量指同一内存条中出现硬故障的Cell单位数量,更具体的描述为出现两次及以上错误的cell单元数量;错误频率指一段时间周期内检测到的所有内存错误与该时间周期的比值,所述所有内存错误包括软错误及重复出现的硬错误;错误类型指内存错误的不同类型,如选择处理(SW Recoverable Action Optional,SRAO)错误、不需要处理(Uncorrected NoAction,UCNA)错误和必须处理(SW Recoverable Action Required,SRAR)错误等;修复操作指某些内存RAS技术的使能,如颗粒内冗余行替换故障行技术(Partial Cache-lineSparing,PCLS)、自适应双DRAM设备校正(Adaptive Double DRAM Device Correction,ADDDC)技术等。It should be noted that in the embodiments of the present application, the root cause of memory errors is referred to as faults, irreversible damage in hardware is referred to as hard faults, and observed fault symptoms are referred to as errors. Errors caused by hard faults are called hard errors, which are persistent and recurring in nature and cannot be resolved over time or by resetting or restarting the system. In contrast, soft faults and soft errors are random and transient, such as bit flips caused by particle collisions. The above-mentioned number of memory hard faults refers to the number of Cell units with hard faults in the same memory bar, and is more specifically described as the number of cell units with two or more errors; the error frequency refers to the ratio of all memory errors detected within a period of time to the period of time, and all memory errors include soft errors and recurring hard errors; the error type refers to different types of memory errors, such as optional processing (SW Recoverable Action Optional, SRAO) errors, uncorrected no action (UCNA) errors, and required processing (SW Recoverable Action Required, SRAR) errors; the repair operation refers to the enabling of certain memory RAS technologies, such as intra-granular redundant row replacement fault row technology (Partial Cache-line Sparing, PCLS), adaptive double DRAM device correction (Adaptive Double DRAM Device Correction, ADDDC) technology, etc.

因此,本申请实施例中的内存数据可以包括可以影响内存状态的内存的全部相关信息。Therefore, the memory data in the embodiments of the present application may include all relevant information of the memory that may affect the memory state.

步骤102,根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;Step 102, dividing the memory data into first memory data and second memory data according to the type of the memory data;

需要说明的是,在步骤101获取内存数据之后,通过分析影响服务器内存健康运行的各种因素,可以将影响因素分为两类。It should be noted that after acquiring the memory data in step 101, various factors affecting the healthy operation of the server memory can be analyzed and the influencing factors can be divided into two categories.

第一类为可直观反映内存健康的直接因素,因此,对此类影响因素对应的内存数据划分为第一内存数据,例如内存硬故障数量、错误频率、错误类型,错误修复操作等,并且上述第一内存数据可通过寄存器日志得到。The first category is direct factors that can intuitively reflect memory health. Therefore, the memory data corresponding to such influencing factors are divided into first memory data, such as the number of memory hard failures, error frequency, error type, error repair operations, etc., and the above first memory data can be obtained through the register log.

第二类为可侧面反映内存异常的间接因素,因此,对此类影响因素对应的内存数据划分为第二内存数据,例如某一时间周期内实测的内存平均温度、平均电压、运行时平均频率、平均擦写速度,间接因素异常时未必会直接影响内存健康,即不一定会直接导致内存错误的出现,但其数据异常可在一定程度上预警内存状态的异常,上述第二内存数据可通过传感器测量得到。The second category is indirect factors that can indirectly reflect memory abnormalities. Therefore, the memory data corresponding to such influencing factors are divided into second memory data, such as the average memory temperature, average voltage, average frequency during operation, and average erase and write speed measured within a certain time period. When indirect factors are abnormal, they may not directly affect the health of the memory, that is, they may not directly lead to the occurrence of memory errors, but their data abnormalities can warn of abnormal memory status to a certain extent. The above second memory data can be measured by sensors.

步骤103,根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,所述预设内存健康度评估模型是基于第一历史内存数据确定的,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的。Step 103, determining a preliminary memory health score according to a preset memory health assessment model and the first memory data, and processing the second memory data through an input-output model to determine a health influencing factor, wherein the preset memory health assessment model is determined based on the first historical memory data, and the input-output model is generated by training a preset initial model based on the second historical memory data.

需要说明的是,在本申请中,在对内存数据进行划分之后,可以针对不同类型的内存数据进行不同的处理。It should be noted that, in the present application, after the memory data is divided, different types of memory data may be processed differently.

具体的,对于第一内存数据来讲,可以根据第一内存数据对应的特性,即对内存状态造成影响的直接因素信息可以生成预设内存健康度评估模型。Specifically, for the first memory data, a preset memory health evaluation model may be generated according to the characteristics corresponding to the first memory data, that is, information about direct factors affecting the memory state.

其中,上述预设内存健康度评估模型可以根据第一内存数据的重要性确定出现某种情况时需要扣除的分数,即预设内存健康度评估模型采取扣分策略,每检测到内存错误根据相应规则进行减分,其分数区间设置为0分到100分。分数越高表示该内存条越健康,分数越低表示该内存条故障风险越低。Among them, the preset memory health evaluation model can determine the score to be deducted when a certain situation occurs according to the importance of the first memory data, that is, the preset memory health evaluation model adopts a deduction strategy, and each time a memory error is detected, points are deducted according to the corresponding rules, and the score range is set to 0 to 100. The higher the score, the healthier the memory bar, and the lower the score, the lower the failure risk of the memory bar.

具体的,如下表1所示,表1为一种示例性的不同内存健康度分数对应的内存状态框架。Specifically, as shown in the following Table 1, Table 1 is an exemplary memory status framework corresponding to different memory health scores.

表1 一种示例性的不同内存健康度分数对应的内存状态框架Table 1 An exemplary memory status framework corresponding to different memory health scores

需要说明的是,所述第一内存数据是通过寄存器日志获取的,所述第一内存数据包括以下至少一种:内存硬故障、内存错误数量、内存错误类型以及使能错误修复操作。It should be noted that the first memory data is obtained through a register log, and the first memory data includes at least one of the following: memory hard fault, memory error number, memory error type, and enabled error repair operation.

所述预设内存健康度评估模型包括:在检测到预设数量的所述内存硬故障的情况下,扣除第一预设分值;在检测到预设时间内所述内存错误数量大于预设阈值的情况下,扣除第二预设分值;根据所述内存错误类型确定所述内存错误类型对应的第三预设分值;根据所述使能错误修复操作确定所述使能错误修复操作对应的第四预设分值。The preset memory health assessment model includes: when a preset number of memory hard failures are detected, deducting a first preset score; when it is detected that the number of memory errors is greater than a preset threshold within a preset time, deducting a second preset score; determining a third preset score corresponding to the memory error type according to the memory error type; and determining a fourth preset score corresponding to the enabled error repair operation according to the enabled error repair operation.

进一步地,所述内存错误类型包括以下至少一种:内存硬错误、内存软错误、SRAO错误、UCNA错误、SRAR错误以及突发致命错误。Furthermore, the memory error type includes at least one of the following: a memory hard error, a memory soft error, an SRAO error, a UCNA error, a SRAR error, and a sudden fatal error.

所述根据所述内存错误类型确定所述内存错误类型对应的第三预设分值包括:在检测到所述内存错误类型为内存硬错误、内存软错误、SRAO错误、UCNA错误、SRAR错误其中一种的情况下,扣除所述第三预设分值;在检测到所述内存错误类型为突发致命错误的情况下,所述初步内存健康度分数为0。The determining of the third preset score corresponding to the memory error type according to the memory error type includes: when it is detected that the memory error type is one of a memory hard error, a memory soft error, an SRAO error, a UCNA error, and a SRAR error, deducting the third preset score; when it is detected that the memory error type is a sudden fatal error, the preliminary memory health score is 0.

进一步地,所述使能错误修复操作包括以下至少一种:消耗PCLS、使能Bank级别ADDDC功能、使能Rank级别ADDDC功能以及使能风暴抑制功能。Furthermore, the enabling error repair operation includes at least one of the following: consuming PCLS, enabling a Bank-level ADDDC function, enabling a Rank-level ADDDC function, and enabling a storm suppression function.

所述使能错误修复操作对应的分数排序从小到大为:消耗PCLS、使能Bank级别ADDDC功能、使能Rank级别ADDDC功能以及使能风暴抑制功能。The scores corresponding to the enabling error repair operations are ranked from small to large as follows: consuming PCLS, enabling the Bank-level ADDDC function, enabling the Rank-level ADDDC function, and enabling the storm suppression function.

需要说明的是,在本申请实施例中,预设内存健康度评估模型根据不同的第一内存数据对应不同的扣分策略。It should be noted that, in the embodiment of the present application, the preset memory health assessment model corresponds to different deduction strategies according to different first memory data.

具体的,第一种,内存硬故障数量扣分规则,内存硬错误是由内存硬故障引起的持久重复性错误,是主导内存故障的主要因素之一。通常服务器RAS策略不会任由其一直重复出现,会通过使能隔离或修复策略对出现硬故障的Cell单元进行处理。内存软故障和软错误通常是瞬时的,通常情况下,出现软错误的Cell单元及时不处理,也会不重复出现错误。因此,本部分仅靠考虑硬故障对内存条实体硬件的影响,在本申请实施例种,例如,扣分规则设定为:每检测到10个硬故障扣除1分,不满10个硬故障按1分扣除。Specifically, the first type is the rule for deducting points based on the number of memory hard failures. Memory hard errors are persistent and repetitive errors caused by memory hard failures, and are one of the main factors that dominate memory failures. Usually, the server RAS policy will not allow them to recur, and will handle the Cell units with hard failures by enabling isolation or repair strategies. Memory soft failures and soft errors are usually instantaneous. Under normal circumstances, if the Cell unit with soft errors is not handled in time, the error will not recur. Therefore, this section only considers the impact of hard failures on the physical hardware of the memory stick. In the embodiment of the present application, for example, the deduction rule is set as follows: 1 point is deducted for every 10 hard failures detected, and 1 point is deducted for less than 10 hard failures.

第二种,内存错误频率扣分规则,当内存条在某一时间周期内发生错误的数量过多,一方面大量内存错误会导致频繁中断信息上班干扰系统正常运行;另一方面,内存错误的高频率出现也在一定程度上映射了内存条的异常。需要说明的是,本部分所设置的扣分规则主要是考虑短时间内存错错误频发的对内存异常情况的映射关系,所设置的阈值不应超出内存错误风暴抑制阈值。在本申请实施例种,例如,扣分规则设定为:每60秒检测到的内存错误数量超过500个,则扣除1分。The second type is the memory error frequency deduction rule. When the number of errors in the memory stick is too large within a certain time period, on the one hand, a large number of memory errors will cause frequent interruptions to work and interfere with the normal operation of the system; on the other hand, the high frequency of memory errors also reflects the abnormality of the memory stick to a certain extent. It should be noted that the deduction rules set in this section mainly consider the mapping relationship between frequent memory errors and memory abnormalities in a short period of time, and the set threshold should not exceed the memory error storm suppression threshold. In the embodiment of the present application, for example, the deduction rule is set as follows: if the number of memory errors detected every 60 seconds exceeds 500, 1 point will be deducted.

第三种,内存错误类型扣分规则,内存错误可以分为可纠正错误和不可纠正错误。进一步地,可纠正错误和不可纠正错误又可细分。内存可纠正错误可分为巡检可纠正错误、读写可纠正错误、镜像回写失败错误等,在本发明中,仅将可纠正错误分为内存硬错误和软错误两类进行分别处理。不可纠正错误可分为①选择处理的不可纠正错误--SRAO错误,SRAO错误的错误码为:Error_Type:UCE;MSCODE:0x0010;②不需要处理的不可纠正错误--UCNA错误,UCNA错误的错误码为:Error_Type:UCE;MSCODE:0x0101;③必须处理的不可纠正错误--SRAR错误,SRAR错误的错误码为:Error_Type:UCE;MSCODE:0x0010。④由于内存条实体硬件错误造成系统宕机的不可纠正错误--突发致命错误,突发致命错误的错误码为:Error_Type:UCE。在本申请实施例种,例如,扣分规则设定为:UCNA、SRAO、SRAR错误出现一次扣5分,突发致命错误出现则内存健康度分数降为0分。The third type is the deduction rule for memory error types. Memory errors can be divided into correctable errors and uncorrectable errors. Furthermore, correctable errors and uncorrectable errors can be further subdivided. Memory correctable errors can be divided into patrol correctable errors, read-write correctable errors, mirror write-back failure errors, etc. In the present invention, only correctable errors are divided into two categories: memory hard errors and soft errors for separate processing. Uncorrectable errors can be divided into ① uncorrectable errors that are selected for processing - SRAO errors, the error code of SRAO errors is: Error_Type: UCE; MSCODE: 0x0010; ② uncorrectable errors that do not need to be processed - UCNA errors, the error code of UCNA errors is: Error_Type: UCE; MSCODE: 0x0101; ③ uncorrectable errors that must be processed - SRAR errors, the error code of SRAR errors is: Error_Type: UCE; MSCODE: 0x0010. ④ uncorrectable errors that cause system downtime due to physical hardware errors of memory bars - sudden fatal errors, the error code of sudden fatal errors is: Error_Type: UCE. In the embodiment of the present application, for example, the deduction rule is set as follows: 5 points are deducted for each occurrence of UCNA, SRAO, and SRAR errors, and the memory health score is reduced to 0 points when a sudden fatal error occurs.

需要说明的是,除上述UCNA、SRAO、SRAR错误类型,内存错误类型还包括一些特定错误,如根据内存结构划分的行错误、列错误、Bank错误等,这些根据结构划分的错误类型,依据相关内存RAS技术人员设定的代码规则进行定义,不同代码规则定义的结果不同,因此所设置的扣分规则也不同。It should be noted that in addition to the above-mentioned UCNA, SRAO, and SRAR error types, memory error types also include some specific errors, such as row errors, column errors, and Bank errors divided according to the memory structure. These error types divided according to the structure are defined according to the code rules set by relevant memory RAS technicians. Different code rules define different results, so the deduction rules set are also different.

第四种,使能错误修复操作扣分规则,在内存RAS技术中,如果检测到内存硬故障,则通常使能OS层执行内存页隔离技术对当前错误所在内存页进行隔离,被隔离的内存页不再有读写操作。而当某些内存页由于某种原因导致无法隔离时,则使能内存修复技术对硬错误进行修复,修复技术指处理器或者内存的故障冗余替换机制,如PCLS、ADDDC等。一般情况下,修复资源是有限的,且部分修复机制影响内存性能。PCLS资源每个内存通道最多可以调用16次,且每个缓存行最多可以调用一次;ADDDC是一种高级内存RAS功能,每个内存通道可以同时支持两组VLS寄存器,使能级别分为Bank和Rank级别,当使能Rank级别ADDDC时,内存性能损失约为25%-30%,同时在一定程度上也反映了内存错误出现的累计次数多,以及内存RAS技术剩余的修复资源消耗殆尽,向用户进行修复资源预警。而当内存瞬间出现大量无法通过内存隔离和修复技术完成处理的错误,错误数量达到某一风暴阈值(不同服务器厂商根据RAS策略的不同具有不同设置),则使能SMI风暴抑制功能,即为防止BIOS层向OS层上报大量系统管理中断(System Management Interrupt,SMI)信息,影响系统正常运行,控制内存页隔离功能失效,也就预示着该内存条不再健康,需要进行更换。在本申请实施例种,例如,扣分规则设定为:设置消耗一次PCLS扣1分、使能Bank级别ADDDC功能扣10分,使能Rank级别ADDDC功能就20分,使能风暴抑制功能扣50分。The fourth type is the rule for deducting points for enabling error repair operations. In memory RAS technology, if a memory hard fault is detected, the OS layer is usually enabled to execute memory page isolation technology to isolate the memory page where the current error is located, and the isolated memory page no longer has read and write operations. When some memory pages cannot be isolated for some reason, memory repair technology is enabled to repair the hard errors. Repair technology refers to the fault redundancy replacement mechanism of the processor or memory, such as PCLS, ADDDC, etc. In general, repair resources are limited, and some repair mechanisms affect memory performance. PCLS resources can be called up to 16 times per memory channel, and each cache line can be called up to once; ADDDC is an advanced memory RAS function. Each memory channel can support two sets of VLS registers at the same time. The enablement level is divided into Bank and Rank levels. When Rank-level ADDDC is enabled, the memory performance loss is about 25%-30%. At the same time, it also reflects to a certain extent that the cumulative number of memory errors has occurred and the remaining repair resources of the memory RAS technology have been exhausted, and the repair resource warning is issued to the user. When a large number of errors that cannot be processed by memory isolation and repair technology appear in the memory instantly, and the number of errors reaches a certain storm threshold (different server manufacturers have different settings according to different RAS policies), the SMI storm suppression function is enabled, that is, to prevent the BIOS layer from reporting a large number of system management interrupt (System Management Interrupt, SMI) information to the OS layer, affecting the normal operation of the system, and controlling the failure of the memory page isolation function, which indicates that the memory stick is no longer healthy and needs to be replaced. In the embodiment of the present application, for example, the deduction rule is set as follows: 1 point is deducted for setting a PCLS consumption, 10 points are deducted for enabling the Bank-level ADDDC function, 20 points are deducted for enabling the Rank-level ADDDC function, and 50 points are deducted for enabling the storm suppression function.

具体的,对于第二内存数据来讲,第二内存数据的特性是对内存状态存在间接影响的间接因素信息,因此,在本申请实施例中,通过对第二内存数据进行训练,得到输入输出模型,根据输入输出模型可以获取一种影响因子,即健康度影响因子。Specifically, for the second memory data, the characteristics of the second memory data are indirect factor information that has an indirect impact on the memory state. Therefore, in an embodiment of the present application, an input-output model is obtained by training the second memory data, and an influencing factor, namely, a health influencing factor, can be obtained based on the input-output model.

上述健康度影响因子可以作为最终的判定内存状态的条件之一。The above health influencing factors can be used as one of the conditions for ultimately determining the memory status.

另外,上述输入输出模型的训练过程参考后续内容。In addition, the training process of the above input and output models can be found in the subsequent content.

步骤104,根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态。Step 104: determine the current memory state according to the preliminary memory health score and the health impact factor.

需要说明的是,在步骤103中确定初步内存健康度分数和通过输入输出模型得到的健康度影响因子后,可以根据上述数据确定当前内存状态。It should be noted that after determining the preliminary memory health score and the health impact factor obtained through the input-output model in step 103, the current memory state can be determined based on the above data.

具体的,前文阐述过,将可映射内存健康度的因素分为了直接因素和间接因素,直接因素用以建立初步的健康度分数策略,间接因素用以作为评估内存异常的方式之一。同时,通过间接因素建立输入输出模型,从而设置健康度影响因子,进一步修正内存健康度分数。因此,当基于输入输出模型得到的输出结果与实测数据进行对比后,若对比误差小于设定阈值则反馈内存无异常。在此情况下,间接因素不对通过直接因素建立的初步来的健康度分数策略产生影响。若当通过间接因素得到反馈结果是内存异常,则其对内存健康度分数的影响一定是消极的。Specifically, as described above, the factors that can be mapped to memory health are divided into direct factors and indirect factors. Direct factors are used to establish a preliminary health score strategy, and indirect factors are used as one of the ways to evaluate memory anomalies. At the same time, an input-output model is established through indirect factors, so as to set the health influencing factor and further correct the memory health score. Therefore, when the output result obtained based on the input-output model is compared with the measured data, if the comparison error is less than the set threshold, the feedback is that there is no abnormality in the memory. In this case, the indirect factors do not affect the preliminary health score strategy established by the direct factors. If the feedback result obtained through the indirect factors is a memory anomaly, then its impact on the memory health score must be negative.

因此,可以通过初步内存健康度分数和健康度影响因子来综合确定最终的内存健康度分数,从而根据例如表1中的内容确定当前内存状态。Therefore, the final memory health score can be comprehensively determined by the preliminary memory health score and the health influencing factor, thereby determining the current memory state according to the content in Table 1, for example.

本申请实施例提供的内存状态检测方法,通过获取内存数据;根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,所述预设内存健康度评估模型是基于第一历史内存数据确定的,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的;根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态。即本申请实施例通过分析影响服务器内存健康运行的各种因素,将不同影响因素对应的内存数据分为两类,即第一内存数据和第二内存数据,其中,第一存储数据对应的第一历史存储数据可以生成预设内存健康度评估模型,第一存储数据可以根据预设内存健康度评估模型确定内存对应的初步内存健康度分数;其次,可以通过对第二历史内存数据进行处理,对预设初始模型进行训练,从而生成输入输出模型,通过将第二内存数据输入至训练好的输入输出模型后,可以确定内存的健康度影响因子,进而根据初步内存健康度分数和健康度影响因子可以确定当前内存状态,本申请通过根据内存数据对内存的影响将内存数据划分为两类,通过健康度影响因子调节内存健康度分数,可以有效且准确的对内存健康情况进行检测。The memory status detection method provided in the embodiment of the present application obtains memory data; divides the memory data into first memory data and second memory data according to the type of the memory data; determines a preliminary memory health score according to a preset memory health assessment model and the first memory data, and processes the second memory data through an input-output model to determine a health influencing factor, wherein the preset memory health assessment model is determined based on the first historical memory data, and the input-output model is generated by training a preset initial model based on the second historical memory data; and determines the current memory status according to the preliminary memory health score and the health influencing factor. That is, the embodiment of the present application analyzes various factors that affect the healthy operation of the server memory, and divides the memory data corresponding to different influencing factors into two categories, namely, first memory data and second memory data, wherein the first historical storage data corresponding to the first storage data can generate a preset memory health assessment model, and the first storage data can determine the preliminary memory health score corresponding to the memory according to the preset memory health assessment model; secondly, the second historical memory data can be processed to train the preset initial model to generate an input-output model, and by inputting the second memory data into the trained input-output model, the memory health influencing factor can be determined, and then the current memory state can be determined according to the preliminary memory health score and the health influencing factor. The present application divides the memory data into two categories according to the impact of the memory data on the memory, and adjusts the memory health score according to the health influencing factor, so as to effectively and accurately detect the memory health status.

参照图2,示出了本申请实施例提供的内存状态检测方法的步骤流程图二,所述方法可以包括:2, a second flow chart of the steps of the memory status detection method provided by an embodiment of the present application is shown. The method may include:

步骤201,获取内存数据;Step 201, obtaining memory data;

步骤202,根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;Step 202, dividing the memory data into first memory data and second memory data according to the type of the memory data;

需要说明的是,上述步骤201-202参照前序论述,在此不再赘述。It should be noted that the above steps 201-202 refer to the previous discussion and will not be repeated here.

步骤203,根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将实际输入数据输入至输入输出模型,得到所述实际输入数据对应的预测输出数据,将所述预测输出数据和实际输出数据进行比对,得到误差值,根据所述误差值确定健康度影响因子,其中,所述预设内存健康度评估模型是基于第一历史内存数据确定的,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的,所述第二内存数据中包括实际输入数据和实际输出数据,所述实际输入数据和所述实际输出数据一一对应;Step 203, determining a preliminary memory health score according to a preset memory health assessment model and the first memory data, and inputting actual input data into an input-output model to obtain predicted output data corresponding to the actual input data, comparing the predicted output data with the actual output data to obtain an error value, and determining a health impact factor according to the error value, wherein the preset memory health assessment model is determined based on the first historical memory data, the input-output model is generated by training a preset initial model based on the second historical memory data, the second memory data includes actual input data and actual output data, and the actual input data and the actual output data correspond one to one;

进一步地,对于输入输出模型来讲,需要对预设初始模型进行训练从而得到,因此,在步骤203之前,参照图4,图4可以包括以下步骤:Further, for the input-output model, it is necessary to train the preset initial model to obtain it. Therefore, before step 203, referring to FIG. 4 , FIG. 4 may include the following steps:

步骤001,将多个预设时间段内采集的实际输入数据进行预处理,生成多个预设时间段内对应的数据集;Step 001, preprocessing the actual input data collected within multiple preset time periods to generate data sets corresponding to the multiple preset time periods;

步骤002,在所述多个预设时间段内对应的数据集中筛选当前时刻对应的三个预设时间段内所述内存状态处于正常状态的目标数据集;Step 002, selecting a target data set whose memory status is in a normal state in three preset time periods corresponding to the current moment from the data sets corresponding to the multiple preset time periods;

步骤003,将所述目标数据集进行归一化处理,得到训练样本。Step 003: normalize the target data set to obtain training samples.

步骤004,根据所述训练样本对预设初始模型进行训练,得到输入输出模型。Step 004: train the preset initial model according to the training samples to obtain an input-output model.

需要说明的是,在本申请实施例中,主要阐述第二内存数据的处理部分,第二内存数据的特性是对内存状态存在间接影响的间接因素信息,因此,在本申请实施例中,通过对第二内存数据进行训练,得到输入输出模型,根据输入输出模型可以获取一种影响因子,即健康度影响因子。It should be noted that in the embodiment of the present application, the processing part of the second memory data is mainly explained. The characteristic of the second memory data is the indirect factor information that has an indirect impact on the memory state. Therefore, in the embodiment of the present application, the input-output model is obtained by training the second memory data. According to the input-output model, an influencing factor can be obtained, namely the health influencing factor.

如图7所示,图7是输入输出模型的训练样本的获取以及训练的具体流程,可以看出,在时间段,每1s主动收集实测的内存温度、电压、运行时频率、擦写速度信息,且每十个数据计算平均值得到间接因素信息“数据集_0”。由间接因素信息“数据集_0”组成“总的间接因素数据集”。其中为收集时间周期,可以设置为1小时、5小时、24小时。As shown in Figure 7, Figure 7 is the specific process of obtaining the training samples of the input and output model and training. It can be seen that During the time period, the measured memory temperature, voltage, operating frequency, and write/erase speed information are actively collected every 1 second, and the average value of every ten data is calculated to obtain the indirect factor information "data set_0". The indirect factor information "data set_0" is used to form the "total indirect factor data set". The collection time period can be set to 1 hour, 5 hours, or 24 hours.

进一步地,所述输入输出模型对应的数据集包括四列向量数据,所述输入输出模型包括三个输入数据对应一个输出数据。Furthermore, the data set corresponding to the input-output model includes four columns of vector data, and the input-output model includes three input data corresponding to one output data.

进一步地,所述三个输入数据包括内存对应的平均电压、运行时平均频率以及平均擦写速度数据,所述输出数据包括内存平均温度数据。Furthermore, the three input data include average voltage, average frequency during operation and average erase/write speed data corresponding to the memory, and the output data includes average temperature data of the memory.

其中,总的间接因素数据集中共四列向量数据,前三列可以分别为内存对应的平均电压、运行时平均频率、平均擦写速度数据,最后一列为内存对应的平均温度数据,各数据向量分别进行0~1区间归一化处理后,得到训练样本。训练样本中前三列作为训练输入输出模型的输入信息,最后一列作为训练输入输出模型的输出信息。Among them, the total indirect factor data set has four columns of vector data. The first three columns can be the average voltage, average frequency during operation, and average erase speed data corresponding to the memory, and the last column is the average temperature data corresponding to the memory. After each data vector is normalized to the interval of 0~1, the training sample is obtained. The first three columns in the training sample are used as the input information of the training input-output model, and the last column is used as the output information of the training input-output model.

因此,在输入输出模型训练好之后,将获取的第二内存数据输入至输入输出模型中,其中,第二内存数据中包括实际输入数据和实际输出数据,实际输入数据和实际输出数据一一对应,其中,实际输入数据包括以下至少一种平均电压、运行时平均频率以及平均擦写速度数据,实际输出数据包括内存平均温度数据。Therefore, after the input-output model is trained, the acquired second memory data is input into the input-output model, wherein the second memory data includes actual input data and actual output data, and the actual input data and the actual output data correspond one to one, wherein the actual input data includes at least one of the following average voltage, average frequency during operation, and average erase speed data, and the actual output data includes average memory temperature data.

具体的,实际输将实际输入数据输入至输入输出模型,得到实际输入数据对应的预测输出数据,将预测输出数据和实际输出数据进行比对,可以得到误差值,根据误差值从而可以确定健康度影响因子。Specifically, the actual input data is input into the input-output model to obtain the predicted output data corresponding to the actual input data. The predicted output data is compared with the actual output data to obtain the error value, and the health influencing factor can be determined based on the error value.

时刻,基于训练样本可以通过网格分割的方式产生Takagi-Sugeno型FIS结构,确定隶属度函数参数的初值,具体的,可以参照图9,图9是利用网格分割方法生成的初始FIS网络结构的示意图。exist At this moment, a Takagi-Sugeno type FIS structure can be generated by grid segmentation based on the training samples, and the initial values of the membership function parameters are determined. Specifically, reference can be made to FIG. 9 , which is a schematic diagram of an initial FIS network structure generated by the grid segmentation method.

需要说明的是,在本申请实施例中,预设初始模型可以为自适应模糊神经网络ANFIS,ANFIS为多输入单输出系统,本申请中的ANFIS为三输入单输出的ANFIS结构,具体的,如图8所示,ANFIS的结构由五层组成,从输入到输出分别为模糊层、规则层、归一化层、去模糊层和输出层。It should be noted that in the embodiment of the present application, the preset initial model can be an adaptive fuzzy neural network ANFIS, ANFIS is a multi-input single-output system, and the ANFIS in the present application is a three-input single-output ANFIS structure. Specifically, as shown in Figure 8, the structure of ANFIS consists of five layers, from input to output, they are fuzzy layer, rule layer, normalization layer, defuzzification layer and output layer.

具体的,在ANFIS中,隶属度函数的参数是通过样本数据集进行训练确定的,隶属度函数相互组合或交互的方式称为规则,其if-then形式的规则描述如下:Specifically, in ANFIS, the parameters of the membership function are determined by training the sample data set. The way in which the membership functions are combined or interacted with each other is called a rule. The if-then rule is described as follows:

规则 1:Rule 1:

规则 2:Rule 2:

规则 3:Rule 3:

上述表示if-then形式中,是节点的输入;为输出;分别为与输入有关的模糊集合,是结果参数,通常称为后件参数。In the above if-then form, , and is the input of the node; is the output; , and are respectively , and The fuzzy set is ; , , and It is the result parameter, usually called the consequent parameter.

第一层:该层节点函数为隶属度函数。The first layer: The node function of this layer is the membership function.

(公式1) (Formula 1)

上述公式1中,为该层输出值;为输入信号数量;为广义钟形隶属函数(gbellmf),定义为In the above formula 1, Output value for this layer; is the number of input signals; , and is the generalized bell-shaped membership function (gbellmf), defined as

(公式2) (Formula 2)

上述公式2中,为条件参数,通常也称为前件参数,参数数值的变化可以改变gbellmf形状。In the above formula 2, , and is a conditional parameter, also commonly known as a precondition parameter. Changes in the parameter value can change the shape of gbellmf.

第二层:该层节点标记为。输出值是通过所有输入成员函数相乘来计算得到的。Second layer: The nodes in this layer are marked as Output value It is calculated by multiplying all input member functions.

(公式3) (Formula 3)

上述公式3中,表示第条规则的激励强度。In the above formula 3, Indicates The incentive strength of the rule.

第三层:该层节点标记为。将前一层的输出结果做归一化处理,输出值是处理后的激励强度。The third layer: The nodes in this layer are marked as . Normalize the output of the previous layer and output the value is the intensity of the stimulus after processing.

(公式4) (Formula 4)

上述公式4中,为第三层的输出值。In the above formula 4, is the output value of the third layer.

第四层:该层在归一化的激励强度和结果函数之间创建一个自适应关联函数,是第三层和第一层的值的乘积。Layer 4: This layer creates an adaptive correlation function between the normalized excitation intensity and the result function. It is the product of the values of the third layer and the first layer.

(公式5) (Formula 5)

上述公式5中,是结果参数,通常称为后件参数。In the above formula 5, , , and It is the result parameter, usually called the consequent parameter.

第五层:该层标记为。它以所有输入信号的总和来计算总输出。Fifth layer: This layer is marked as . It calculates the total output as the sum of all input signals.

(公式6) (Formula 6)

上述公式6中,所有输入信号的总和。In the above formula 6, The sum of all input signals.

需要说明的是,ANFIS中的模糊隶属度函数参数(包括前件参数和后件参数)是通过大量已知数据生成初始模糊模型再进行训练获得的。通过这种迭代自适应学习过程对ANFIS进行训练,模型系统的前件和后件参数能够被优化调整,最后确定能够拟合训练数据集的隶属度函数参数值。在每次迭代训练中,实际输出与预期输出之间的误差可以被减小,当达到预定的训练次数或错误率时停止训练过程。It should be noted that the fuzzy membership function parameters (including antecedent parameters and consequent parameters) in ANFIS are obtained by generating an initial fuzzy model from a large amount of known data and then training it. Through this iterative adaptive learning process, ANFIS is trained, the antecedent and consequent parameters of the model system can be optimized and adjusted, and finally the membership function parameter values that can fit the training data set are determined. In each iterative training, the error between the actual output and the expected output can be reduced, and the training process is stopped when the predetermined number of training times or error rate is reached.

时间段,每1s主动收集实测的内存温度、电压、运行时频率、擦写速度信息,且每十个数据计算平均值得到间接因素信息“数据集_1”。同时在该时间段内,使用平均电压、运行时平均频率、平均擦写速度数据作为上一周期获得的ANFIS模型的输入信息,可得到相应的平均温度输出值,将输出值与实测的平均内存温度数据进行对比,若对比误差小于设定的阈值则说明该时间段内间接因素信息反馈内存无异常。exist During the time period, the measured memory temperature, voltage, runtime frequency, and erase speed information are actively collected every 1 second, and the average value of every ten data is calculated to obtain the indirect factor information "Dataset_1". At the same time, during this time period, the average voltage, average runtime frequency, and average erase speed data are used as the input information of the ANFIS model obtained in the previous cycle to obtain the corresponding average temperature output value, and the output value is compared with the measured average memory temperature data. If the comparison error is less than the set threshold, it means that there is no abnormality in the indirect factor information feedback memory during this time period.

时刻,由数据集_1补充总的间接因素数据集,经归一化处理后得到新的训练样本,并在上一个周期训练得到的ANFIS模型规则基础上进行再训练,可以快速的更新ANFIS模型规则;exist At this moment, the total indirect factor data set is supplemented by data set_1, and new training samples are obtained after normalization. Retraining is performed on the basis of the ANFIS model rules obtained by the previous cycle training, so that the ANFIS model rules can be updated quickly.

相似的,在时间段内,可以重复上述时间段和时刻对应的操作,以获得训练样本的获取及ANFIS模型的训练具体流程。Similarly, in time You can repeat the above Time period and The operations corresponding to each moment are performed to obtain the specific process of obtaining training samples and training the ANFIS model.

需要说明的是,为避免数据集过大可通过设置滑动时间窗口,使得总的间接因素数据集中最多可包含最近3个时间周期内收集的反馈内存无异常的数据;训练样本的获取及ANFIS模型的训练过程,可以设置以间隔时间段的周期性方式进行,也可以认为主动方式设置非周期性的进行,上述仅以间隔时间段的周期性方式进行说明。It should be noted that in order to avoid the data set being too large, a sliding time window can be set so that the total indirect factor data set can contain at most the feedback data collected in the last three time periods without abnormalities; the acquisition of training samples and the training process of the ANFIS model can be set to intervals. The periodic mode of the time period can also be considered as the non-periodic mode of the active mode setting. The periodicity of the time period is described.

进一步地,步骤203中,根据所述误差值确定健康度影响因子包括:在检测到所述误差值小于目标误差阈值的情况下,将所述健康度影响因子设置为1;在检测到所述误差值大于所述目标误差阈值的情况下,获取预先设置的内存健康度策略,根据所述内存健康度策略确定所述健康度影响因子。Further, in step 203, determining the health impact factor based on the error value includes: when it is detected that the error value is less than the target error threshold, setting the health impact factor to 1; when it is detected that the error value is greater than the target error threshold, obtaining a pre-set memory health policy, and determining the health impact factor based on the memory health policy.

需要说明的是,在本申请实施例中,当通过间接因素得到反馈结果是内存异常,则其对内存健康度分数的影响一定是消极的,因此,本申请实施例可以通过设置健康度影响因子规则。It should be noted that in the embodiment of the present application, when the feedback result obtained through indirect factors is a memory abnormality, its impact on the memory health score must be negative. Therefore, the embodiment of the present application can set health impact factor rules.

进一步地,所述目标误差阈值是根据所述预测输出数据和所述实际输出数据之间的目标均方误差向量对应的均方根误差确定的。Further, the target error threshold is determined according to a root mean square error corresponding to a target mean square error vector between the predicted output data and the actual output data.

需要说明的是,基于样本数据对输入输出模型进行训练时,可以获取训练数据最小均方误差向量,求得最小均方误差向量的均方根误差,可以根据预设经验公式,设置3作为下一次输入输出模型规则更新前的阈值,用于标量化通过间接因素得到的反馈结果。It should be noted that when the input-output model is trained based on sample data, the minimum mean square error vector of the training data can be obtained, and the root mean square error of the minimum mean square error vector can be obtained. , you can set 3 according to the preset empirical formula As the threshold before the next input-output model rule update, it is used to scalarize the feedback results obtained through indirect factors.

若基于输入输出模型得到的输出结果与实测数据之间的对比误差小于3,则表示内存无异常,即间接因素不影响内存健康度策略,健康度影响因子设置为1;若对比误差大于3,则设置间接因素影响内存健康度策略,健康度影响因子数值随误差大小变化。If the comparison error between the output result obtained based on the input-output model and the measured data is less than 3 , it means that there is no abnormality in the memory, that is, the indirect factors do not affect the memory health strategy, and the health impact factor is set to 1; if the comparison error is greater than 3 , then set the indirect factors to affect the memory health strategy, and the value of the health impact factor changes with the error size.

步骤204,根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态。Step 204: determine the current memory state according to the preliminary memory health score and the health impact factor.

需要说明的是,上述步骤204参照前序论述,在此不再赘述。It should be noted that the above step 204 is described with reference to the previous discussion and will not be repeated here.

本申请实施例通过分析影响服务器内存健康运行的各种因素,将不同影响因素对应的内存数据分为两类,即第一内存数据和第二内存数据,其中,第一存储数据对应的第一历史存储数据可以生成预设内存健康度评估模型,第一存储数据可以根据预设内存健康度评估模型确定内存对应的初步内存健康度分数;其次,可以通过对第二历史内存数据进行处理,对预设初始模型进行训练,从而生成输入输出模型,通过将第二内存数据输入至训练好的输入输出模型后,可以确定内存的健康度影响因子,进而根据初步内存健康度分数和健康度影响因子可以确定当前内存状态,本申请通过根据内存数据对内存的影响将内存数据划分为两类,通过健康度影响因子调节内存健康度分数,可以有效且准确的对内存健康情况进行检测。The embodiment of the present application analyzes various factors that affect the healthy operation of the server memory, and divides the memory data corresponding to different influencing factors into two categories, namely, first memory data and second memory data, wherein the first historical storage data corresponding to the first storage data can generate a preset memory health assessment model, and the first storage data can determine the preliminary memory health score corresponding to the memory according to the preset memory health assessment model; secondly, the second historical memory data can be processed to train the preset initial model to generate an input-output model, and the health influencing factor of the memory can be determined by inputting the second memory data into the trained input-output model, and then the current memory state can be determined according to the preliminary memory health score and the health influencing factor. The present application divides the memory data into two categories according to the impact of the memory data on the memory, and adjusts the memory health score according to the health influencing factor, so as to effectively and accurately detect the memory health status.

另外,本申请实施例中基于输入输出模型,获取了内存实测温度、电压、运行频率、擦写速度信息之间的映射关系。设计了建立和更新输入输出模型规则的方式,可根据实测数据在线更新输入输出模型,可根据输入输出模型预测结果在一定程度上预警内存风险。In addition, in the embodiment of the present application, the mapping relationship between the measured temperature, voltage, operating frequency, and erase speed information of the memory is obtained based on the input-output model. A method for establishing and updating the input-output model rules is designed, and the input-output model can be updated online according to the measured data, and the memory risk can be warned to a certain extent according to the prediction results of the input-output model.

另外,本申请实施例中提出了内存健康度影响因子概念,并给出了内存健康度影响因子的详细获取方法,以及健康度影响因子与初步内存健康评价策略获得的健康分数的计算方式。In addition, the concept of memory health influencing factor is proposed in the embodiment of the present application, and a detailed method for obtaining the memory health influencing factor and a method for calculating the health score obtained by the health influencing factor and the preliminary memory health evaluation strategy are given.

参照图3,示出了本申请实施例提供的内存状态检测方法的步骤流程图三,所述方法可以包括:3 , a flowchart of the third step of the memory status detection method provided by an embodiment of the present application is shown. The method may include:

步骤301,获取内存数据;Step 301, obtaining memory data;

步骤302,根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;Step 302, dividing the memory data into first memory data and second memory data according to the type of the memory data;

步骤303,根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,所述预设内存健康度评估模型是基于第一历史内存数据确定的,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的;Step 303, determining a preliminary memory health score according to a preset memory health assessment model and the first memory data, and processing the second memory data through an input-output model to determine a health influencing factor, wherein the preset memory health assessment model is determined based on the first historical memory data, and the input-output model is generated by training a preset initial model based on the second historical memory data;

需要说明的是,上述步骤301-303参照前序论述,在此不再赘述。It should be noted that the above steps 301-303 refer to the previous discussion and will not be repeated here.

步骤304,根据所述初步内存健康度分数、当前时刻的所述健康度影响因子以及所述当前时刻对应的上一个预设时间周期内的所述健康度影响因子确定当前时刻内存对应的目标内存健康度分数。Step 304, determining a target memory health score corresponding to the memory at the current moment according to the preliminary memory health score, the health impact factor at the current moment, and the health impact factor within a previous preset time period corresponding to the current moment.

需要说明的是,在本申请实施例中,通过初步内存健康度分数、当前时刻的所述健康度影响因子以及所述当前时刻对应的上一个预设时间周期内的所述健康度影响因子确定当前时刻内存对应的目标内存健康度分数。It should be noted that in this embodiment of the present application, the target memory health score corresponding to the memory at the current moment is determined by the preliminary memory health score, the health impact factor at the current moment, and the health impact factor within the previous preset time period corresponding to the current moment.

其中,当前时刻对应的上一个预设时间周期内的所述健康度影响因子是指当前时刻的上一时间周期内的健康度影响因子,其中,时间周期可以指前序阐述过的间隔时间段的周期。The health impact factor in the last preset time period corresponding to the current moment refers to the health impact factor in the last time period of the current moment, wherein the time period may refer to the interval described above. The period of the time period.

具体的,所述目标内存健康度分数通过以下公式生成:Specifically, the target memory health score is generated by the following formula:

(公式7) (Formula 7)

其中,上述公式7中,用于表示所述目标内存健康度分数,用于表示所述初步内存健康度分数,用于表示当前时刻的所述健康度影响因子,用于表示所述当前时刻对应的上一个预设时间周期内的所述健康度影响因子,用于表示多个所述预测输出数据和实际输出数据之间的误差值对应的平均值,用于表示目标误差阈值。Among them, in the above formula 7, It is used to represent the target memory health score. is used to represent the preliminary memory health score, It is used to represent the health impact factor at the current moment, used to indicate the health impact factor within the previous preset time period corresponding to the current moment, It is used to represent the average value corresponding to the error values between the predicted output data and the actual output data, Used to indicate the target error threshold.

步骤305,根据所述目标内存健康度分数确定当前时刻内存对应的状态。Step 305: Determine the state of the memory at the current moment according to the target memory health score.

进一步地,步骤305可以包括:在检测到所述目标内存健康度分数小于目标健康度阈值,或者,所述健康度影响因子小于或者大于1的情况下,向用户发送预警信息,以使所述用户通过预设显示界面查看当前时刻的内存状态。Further, step 305 may include: when it is detected that the target memory health score is less than the target health threshold, or the health impact factor is less than or greater than 1, sending a warning message to the user so that the user can view the current memory status through a preset display interface.

需要说明的是,在本申请实施例中,可以通过Web界面显示。用户可通过输入服务器登录地址,查询服务器所配置的各内存条的健康度评价结果,显示界面示意图10所示。It should be noted that in the embodiment of the present application, the health evaluation results of each memory module configured by the server can be displayed through a Web interface. The user can query the health evaluation results of each memory module configured by the server by entering the server login address, as shown in the display interface schematic diagram 10.

另外,在检测到目标内存健康度分数小于目标健康度阈值,或者,健康度影响因子小于或者大于1的情况下,向用户发送预警信息,以使用户通过预设显示界面查看当前时刻的内存状态,其中,预设显示界面即前序Web界面,用户可通过显示界面查看各内存条详细信息,如图11。若某一内存健康分数小于60分,或健康度影响因子不为1,则主动向用户预警。In addition, when it is detected that the target memory health score is less than the target health threshold, or the health impact factor is less than or greater than 1, a warning message is sent to the user so that the user can view the current memory status through the preset display interface, where the preset display interface is the previous Web interface, and the user can view the detailed information of each memory bar through the display interface, as shown in Figure 11. If a certain memory health score is less than 60 points, or the health impact factor is not 1, the user is actively warned.

本申请实施例通过分析影响服务器内存健康运行的各种因素,将不同影响因素对应的内存数据分为两类,即第一内存数据和第二内存数据,其中,第一存储数据对应的第一历史存储数据可以生成预设内存健康度评估模型,第一存储数据可以根据预设内存健康度评估模型确定内存对应的初步内存健康度分数;其次,可以通过对第二历史内存数据进行处理,对预设初始模型进行训练,从而生成输入输出模型,通过将第二内存数据输入至训练好的输入输出模型后,可以确定内存的健康度影响因子,进而根据初步内存健康度分数和健康度影响因子可以确定当前内存状态,本申请通过根据内存数据对内存的影响将内存数据划分为两类,通过健康度影响因子调节内存健康度分数,可以有效且准确的对内存健康情况进行检测。The embodiment of the present application analyzes various factors that affect the healthy operation of the server memory, and divides the memory data corresponding to different influencing factors into two categories, namely, first memory data and second memory data, wherein the first historical storage data corresponding to the first storage data can generate a preset memory health assessment model, and the first storage data can determine the preliminary memory health score corresponding to the memory according to the preset memory health assessment model; secondly, the second historical memory data can be processed to train the preset initial model to generate an input-output model, and the health influencing factor of the memory can be determined by inputting the second memory data into the trained input-output model, and then the current memory state can be determined according to the preliminary memory health score and the health influencing factor. The present application divides the memory data into two categories according to the impact of the memory data on the memory, and adjusts the memory health score according to the health influencing factor, so as to effectively and accurately detect the memory health status.

另外,本申请实施例在实现有效对内存健康度进行标量评价的同时,可根据内存健康度分数以及健康影响因子数值对内存异常进行预警,当内存度分数低至异常阈值,提示用户更换健康硬件。In addition, while the embodiment of the present application realizes effective scalar evaluation of memory health, it can also warn of memory anomalies based on the memory health score and the health impact factor value. When the memory score is low to the abnormal threshold, the user is prompted to replace healthy hardware.

参照图5,图5示出了本申请实施例提供的一种内存状态检测装置,所述装置可以包括:Referring to FIG. 5 , FIG. 5 shows a memory status detection device provided by an embodiment of the present application, and the device may include:

获取模块501,用于获取内存数据;An acquisition module 501 is used to acquire memory data;

划分模块502,用于根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;A division module 502, configured to divide the memory data into first memory data and second memory data according to the type of the memory data;

第一确定模块503,用于根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,所述预设内存健康度评估模型是基于第一历史内存数据确定的,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的;A first determination module 503 is used to determine a preliminary memory health score according to a preset memory health assessment model and the first memory data, and to process the second memory data through an input-output model to determine a health influencing factor, wherein the preset memory health assessment model is determined based on the first historical memory data, and the input-output model is generated by training a preset initial model based on the second historical memory data;

第二确定模块504,用于根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态。The second determination module 504 is used to determine the current memory state according to the preliminary memory health score and the health impact factor.

本申请实施例通过分析影响服务器内存健康运行的各种因素,将不同影响因素对应的内存数据分为两类,即第一内存数据和第二内存数据,其中,第一存储数据对应的第一历史存储数据可以生成预设内存健康度评估模型,第一存储数据可以根据预设内存健康度评估模型确定内存对应的初步内存健康度分数;其次,可以通过对第二历史内存数据进行处理,对预设初始模型进行训练,从而生成输入输出模型,通过将第二内存数据输入至训练好的输入输出模型后,可以确定内存的健康度影响因子,进而根据初步内存健康度分数和健康度影响因子可以确定当前内存状态,本申请通过根据内存数据对内存的影响将内存数据划分为两类,通过健康度影响因子调节内存健康度分数,可以有效且准确的对内存健康情况进行检测。The embodiment of the present application analyzes various factors that affect the healthy operation of the server memory, and divides the memory data corresponding to different influencing factors into two categories, namely, first memory data and second memory data, wherein the first historical storage data corresponding to the first storage data can generate a preset memory health assessment model, and the first storage data can determine the preliminary memory health score corresponding to the memory according to the preset memory health assessment model; secondly, the second historical memory data can be processed to train the preset initial model to generate an input-output model, and the health influencing factor of the memory can be determined by inputting the second memory data into the trained input-output model, and then the current memory state can be determined according to the preliminary memory health score and the health influencing factor. The present application divides the memory data into two categories according to the impact of the memory data on the memory, and adjusts the memory health score according to the health influencing factor, so as to effectively and accurately detect the memory health status.

本申请实施例还提供了一种通信设备,如图6所示,包括处理器601、通信接口602、存储器603和通信总线604,其中,处理器601,通信接口602,存储器603通过通信总线604完成相互间的通信,The embodiment of the present application further provides a communication device, as shown in FIG6 , including a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602, and the memory 603 communicate with each other through the communication bus 604.

存储器603,用于存放计算机程序;Memory 603, used for storing computer programs;

处理器601,用于执行存储器603上所存放的程序时,可以实现如下步骤:The processor 601, when used to execute the program stored in the memory 603, can implement the following steps:

获取内存数据;Get memory data;

根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;Dividing the memory data into first memory data and second memory data according to the type of the memory data;

根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,所述预设内存健康度评估模型是基于第一历史内存数据确定的,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的;Determine a preliminary memory health score according to a preset memory health assessment model and the first memory data, and process the second memory data through an input-output model to determine a health influencing factor, wherein the preset memory health assessment model is determined based on the first historical memory data, and the input-output model is generated by training a preset initial model based on the second historical memory data;

根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态。The current memory state is determined according to the preliminary memory health score and the health influencing factor.

上述终端提到的通信总线可以是外设部件互连标准(Peripheral ComponentInterconnect,简称PCI)总线或扩展工业标准结构(Extended Industry StandardArchitecture,简称EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus mentioned in the above terminal can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The communication bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

通信接口用于上述终端与其他设备之间的通信。The communication interface is used for communication between the above terminal and other devices.

存储器可以包括随机存取存储器(Random Access Memory,简称RAM),也可以包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include a random access memory (RAM) or a non-volatile memory, such as at least one disk memory. Optionally, the memory may also be at least one storage device located away from the aforementioned processor.

上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processing,简称DSP)、专用集成电路(Application SpecificIntegrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

在本申请提供的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述实施例中任一所述的内存状态检测。In another embodiment provided in the present application, a computer-readable storage medium is provided, in which instructions are stored. When the computer-readable storage medium is executed on a computer, the computer executes the memory status detection described in any one of the above embodiments.

在本申请提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例中任一所述的内存状态检测。In another embodiment provided in the present application, a computer program product including instructions is also provided, which, when executed on a computer, enables the computer to perform the memory status detection described in any one of the above embodiments.

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或第三数据库通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或第三数据库进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、第三数据库等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk (SSD))等。In the above embodiments, it can be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented using software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, a computer, a server or a third database by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or third database. The computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or a third database that includes one or more available media integrated. The available medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (eg, a solid state disk (SSD)).

需要说明的是,在本文中,诸如第一和第一等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this article, relational terms such as first and first are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the existence of other identical elements in the process, method, article or device including the elements.

本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a related manner, and the same or similar parts between the embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.

以上所述仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。The above description is only a preferred embodiment of the present application and is not intended to limit the protection scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (19)

Translated fromChinese
1.一种内存状态检测方法,其特征在于,所述内存状态检测方法包括:1. A memory status detection method, characterized in that the memory status detection method comprises:获取内存数据;Get memory data;根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;Dividing the memory data into first memory data and second memory data according to the type of the memory data;根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,所述预设内存健康度评估模型是基于第一历史内存数据确定的,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的;Determine a preliminary memory health score according to a preset memory health assessment model and the first memory data, and process the second memory data through an input-output model to determine a health influencing factor, wherein the preset memory health assessment model is determined based on the first historical memory data, and the input-output model is generated by training a preset initial model based on the second historical memory data;根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态;Determine a current memory state according to the preliminary memory health score and the health impact factor;所述第二内存数据中包括实际输入数据和实际输出数据,所述实际输入数据和所述实际输出数据一一对应;The second memory data includes actual input data and actual output data, and the actual input data and the actual output data correspond one to one;所述将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子包括:The processing of the second memory data through the input-output model to determine the health impact factor includes:将实际输入数据输入至输入输出模型,得到所述实际输入数据对应的预测输出数据;Inputting actual input data into the input-output model to obtain predicted output data corresponding to the actual input data;将所述预测输出数据和实际输出数据进行比对,得到误差值;Comparing the predicted output data with the actual output data to obtain an error value;根据所述误差值确定健康度影响因子。A health impact factor is determined according to the error value.2.根据权利要求1所述的方法,其特征在于,在所述将实际输入数据输入至输入输出模型,得到所述实际输入数据对应的预测输出数据的步骤之前,所述方法包括:2. The method according to claim 1, characterized in that before the step of inputting the actual input data into the input-output model to obtain the predicted output data corresponding to the actual input data, the method comprises:将多个预设时间段内采集的实际输入数据进行预处理,生成多个预设时间段内对应的数据集;Preprocessing the actual input data collected within multiple preset time periods to generate data sets corresponding to the multiple preset time periods;在所述多个预设时间段内对应的数据集中筛选当前时刻对应的三个预设时间段内所述内存状态处于正常状态的目标数据集;Filtering, from the data sets corresponding to the multiple preset time periods, a target data set whose memory status is in a normal state within three preset time periods corresponding to the current moment;将所述目标数据集进行归一化处理,得到训练样本。The target data set is normalized to obtain training samples.3.根据权利要求2所述的方法,其特征在于,在所述将所述目标数据集进行归一化处理,得到训练样本的步骤之后,所述方法包括:3. The method according to claim 2, characterized in that after the step of normalizing the target data set to obtain training samples, the method comprises:根据所述训练样本对预设初始模型进行训练,得到输入输出模型。The preset initial model is trained according to the training samples to obtain an input-output model.4.根据权利要求1所述的方法,其特征在于,所述根据所述误差值确定健康度影响因子包括:4. The method according to claim 1, wherein determining the health impact factor according to the error value comprises:在检测到所述误差值小于目标误差阈值的情况下,将所述健康度影响因子设置为1;When it is detected that the error value is less than the target error threshold, the health impact factor is set to 1;在检测到所述误差值大于所述目标误差阈值的情况下,获取预先设置的内存健康度策略,根据所述内存健康度策略确定所述健康度影响因子。When it is detected that the error value is greater than the target error threshold, a preset memory health policy is obtained, and the health impact factor is determined according to the memory health policy.5.根据权利要求4所述的方法,其特征在于,所述目标误差阈值是根据所述预测输出数据和所述实际输出数据之间的目标均方误差向量对应的均方根误差确定的。5. The method according to claim 4, characterized in that the target error threshold is determined according to a root mean square error corresponding to a target mean square error vector between the predicted output data and the actual output data.6.根据权利要求1所述的方法,其特征在于,所述根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态包括:6. The method according to claim 1, wherein determining the current memory state according to the preliminary memory health score and the health impact factor comprises:根据所述初步内存健康度分数、当前时刻的所述健康度影响因子以及所述当前时刻对应的上一个预设时间周期内的所述健康度影响因子确定当前时刻内存对应的目标内存健康度分数;Determine a target memory health score corresponding to the memory at the current moment according to the preliminary memory health score, the health impact factor at the current moment, and the health impact factor within the previous preset time period corresponding to the current moment;根据所述目标内存健康度分数确定当前时刻内存对应的状态。The state of the memory at the current moment is determined according to the target memory health score.7.根据权利要求6所述的方法,其特征在于,所述目标内存健康度分数通过以下公式生成:7. The method according to claim 6, wherein the target memory health score is generated by the following formula:其中,上述公式中,用于表示所述目标内存健康度分数,用于表示所述初步内存健康度分数,用于表示当前时刻的所述健康度影响因子,用于表示所述当前时刻对应的上一个预设时间周期内的所述健康度影响因子,用于表示多个预测输出数据和实际输出数据之间的误差值对应的平均值,用于表示目标误差阈值。Among them, in the above formula, It is used to represent the target memory health score. is used to represent the preliminary memory health score, It is used to represent the health impact factor at the current moment, used to indicate the health impact factor within the previous preset time period corresponding to the current moment, It is used to represent the average value of the error between multiple predicted output data and actual output data. Used to indicate the target error threshold.8.根据权利要求6所述的方法,其特征在于,所述根据所述目标内存健康度分数确定当前时刻内存对应的状态包括:8. The method according to claim 6, wherein determining the state corresponding to the memory at the current moment according to the target memory health score comprises:在检测到所述目标内存健康度分数小于目标健康度阈值,或者,所述健康度影响因子小于或者大于1的情况下,向用户发送预警信息,以使所述用户通过预设显示界面查看当前时刻的内存状态。When it is detected that the target memory health score is less than the target health threshold, or the health impact factor is less than or greater than 1, a warning message is sent to the user so that the user can view the current memory status through a preset display interface.9.根据权利要求1所述的内存状态检测方法,其特征在于,所述第一内存数据是通过寄存器日志获取的,所述第一内存数据包括以下至少一种:内存硬故障、内存错误数量、内存错误类型以及使能错误修复操作。9. The memory status detection method according to claim 1 is characterized in that the first memory data is obtained through a register log, and the first memory data includes at least one of the following: memory hard faults, memory error numbers, memory error types, and enabled error repair operations.10.根据权利要求9所述的内存状态检测方法,其特征在于,所述预设内存健康度评估模型包括:10. The memory status detection method according to claim 9, wherein the preset memory health evaluation model comprises:在检测到预设数量的所述内存硬故障的情况下,扣除第一预设分值;In the case where a preset number of memory hard failures are detected, deducting a first preset score;在检测到预设时间内所述内存错误数量大于预设阈值的情况下,扣除第二预设分值;When it is detected that the number of memory errors within a preset time is greater than a preset threshold, a second preset score is deducted;根据所述内存错误类型确定所述内存错误类型对应的第三预设分值;Determine, according to the memory error type, a third preset score corresponding to the memory error type;根据所述使能错误修复操作确定所述使能错误修复操作对应的第四预设分值。A fourth preset score corresponding to the enabling error recovery operation is determined according to the enabling error recovery operation.11.根据权利要求10所述的内存状态检测方法,其特征在于,所述内存错误类型包括以下至少一种:内存硬错误、内存软错误、选择处理SRAO错误、不需要处理UCNA错误、必须处理SRAR错误以及突发致命错误。11. The memory status detection method according to claim 10, characterized in that the memory error type includes at least one of the following: memory hard error, memory soft error, optional processing SRAO error, no need to process UCNA error, must process SRAR error and sudden fatal error.12.根据权利要求11所述的内存状态检测方法,其特征在于,所述根据所述内存错误类型确定所述内存错误类型对应的第三预设分值包括:12. The memory status detection method according to claim 11, wherein determining the third preset score corresponding to the memory error type according to the memory error type comprises:在检测到所述内存错误类型为内存硬错误、内存软错误、SRAO错误、UCNA错误、SRAR错误其中一种的情况下,扣除所述第三预设分值;When it is detected that the memory error type is one of a memory hard error, a memory soft error, an SRAO error, a UCNA error, and an SRAR error, deducting the third preset score;在检测到所述内存错误类型为突发致命错误的情况下,所述初步内存健康度分数为0。When it is detected that the memory error type is a sudden fatal error, the preliminary memory health score is 0.13.根据权利要求10所述的内存状态检测方法,其特征在于,所述使能错误修复操作包括以下至少一种:消耗颗粒内冗余行替换故障行技术、使能Bank级别自适应双DRAM设备校正ADDDC功能、使能Rank级别自适应双DRAM设备校正ADDDC功能以及使能风暴抑制功能。13. The memory status detection method according to claim 10 is characterized in that the enabling error repair operation includes at least one of the following: consuming redundant rows within the particle to replace faulty rows, enabling Bank-level adaptive dual DRAM device correction ADDDC function, enabling Rank-level adaptive dual DRAM device correction ADDDC function, and enabling storm suppression function.14.根据权利要求13所述的内存状态检测方法,其特征在于,所述使能错误修复操作对应的分数排序从小到大为:消耗PCLS、使能Bank级别ADDDC功能、使能Rank级别ADDDC功能以及使能风暴抑制功能。14. The memory status detection method according to claim 13 is characterized in that the scores corresponding to the enabling error repair operations are ranked from small to large as follows: consuming PCLS, enabling Bank level ADDDC function, enabling Rank level ADDDC function, and enabling storm suppression function.15.根据权利要求1所述的内存状态检测方法,其特征在于,所述输入输出模型对应的数据集包括四列向量数据,所述输入输出模型包括三个输入数据对应一个输出数据。15. The memory status detection method according to claim 1 is characterized in that the data set corresponding to the input-output model includes four columns of vector data, and the input-output model includes three input data corresponding to one output data.16.根据权利要求15所述的内存状态检测方法,其特征在于,所述三个输入数据包括内存对应的平均电压、运行时平均频率以及平均擦写速度数据,所述输出数据包括内存对应的平均温度数据。16. The memory status detection method according to claim 15, characterized in that the three input data include average voltage, average frequency during operation and average erase speed data corresponding to the memory, and the output data includes average temperature data corresponding to the memory.17.一种内存状态检测装置,其特征在于,所述装置包括:17. A memory status detection device, characterized in that the device comprises:获取模块,用于获取内存数据;Acquisition module, used to obtain memory data;划分模块,用于根据所述内存数据的类型将所述内存数据划分为第一内存数据和第二内存数据;A division module, used for dividing the memory data into first memory data and second memory data according to the type of the memory data;第一确定模块,用于根据预设内存健康度评估模型和所述第一内存数据确定初步内存健康度分数,以及,将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子,其中,所述预设内存健康度评估模型是基于第一历史内存数据确定的,所述输入输出模型是基于第二历史内存数据对预设初始模型进行训练生成的;a first determination module, configured to determine a preliminary memory health score according to a preset memory health assessment model and the first memory data, and to process the second memory data through an input-output model to determine a health influencing factor, wherein the preset memory health assessment model is determined based on the first historical memory data, and the input-output model is generated by training a preset initial model based on the second historical memory data;第二确定模块,用于根据所述初步内存健康度分数和所述健康度影响因子确定当前内存状态;A second determination module, configured to determine a current memory state according to the preliminary memory health score and the health impact factor;所述第二内存数据中包括实际输入数据和实际输出数据,所述实际输入数据和所述实际输出数据一一对应;The second memory data includes actual input data and actual output data, and the actual input data and the actual output data correspond one to one;所述将所述第二内存数据通过输入输出模型进行处理,确定健康度影响因子包括:The processing of the second memory data through the input-output model to determine the health impact factor includes:将实际输入数据输入至输入输出模型,得到所述实际输入数据对应的预测输出数据;Inputting actual input data into the input-output model to obtain predicted output data corresponding to the actual input data;将所述预测输出数据和实际输出数据进行比对,得到误差值;Comparing the predicted output data with the actual output data to obtain an error value;根据所述误差值确定健康度影响因子。A health impact factor is determined according to the error value.18.一种通信设备,其特征在于,包括:收发机、存储器、处理器及存储在所述存储器上并可在所述处理器上运行的程序;18. A communication device, comprising: a transceiver, a memory, a processor, and a program stored in the memory and executable on the processor;所述处理器,用于读取存储器中的程序实现如权利要求1-16中任意一项所述内存状态检测方法。The processor is used to read the program in the memory to implement the memory status detection method as described in any one of claims 1-16.19.一种可读存储介质,用于存储程序,其特征在于,所述程序被处理器执行时实现如权利要求1-16中任意一项所述内存状态检测方法。19. A readable storage medium for storing a program, characterized in that when the program is executed by a processor, the memory status detection method as described in any one of claims 1 to 16 is implemented.
CN202310935420.8A2023-07-282023-07-28Memory state detection method, device, communication equipment and storage mediumActiveCN116680112B (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
CN202310935420.8ACN116680112B (en)2023-07-282023-07-28Memory state detection method, device, communication equipment and storage medium
PCT/CN2024/087888WO2025025683A1 (en)2023-07-282024-04-16Memory state detection method and apparatus, and communication device and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310935420.8ACN116680112B (en)2023-07-282023-07-28Memory state detection method, device, communication equipment and storage medium

Publications (2)

Publication NumberPublication Date
CN116680112A CN116680112A (en)2023-09-01
CN116680112Btrue CN116680112B (en)2023-11-03

Family

ID=87785814

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310935420.8AActiveCN116680112B (en)2023-07-282023-07-28Memory state detection method, device, communication equipment and storage medium

Country Status (2)

CountryLink
CN (1)CN116680112B (en)
WO (1)WO2025025683A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116680112B (en)*2023-07-282023-11-03苏州浪潮智能科技有限公司Memory state detection method, device, communication equipment and storage medium
CN119179615A (en)*2024-11-252024-12-24皇虎测试科技(深圳)有限公司Memory bank fault positioning protection method, device, medium and equipment
CN119322699B (en)*2024-12-162025-04-08皇虎测试科技(深圳)有限公司Method for processing memory bank initialization failure
CN120233957A (en)*2025-05-302025-07-01深蓝汽车科技有限公司 Memory optimization method, model training method, device, equipment and vehicle
CN120336030B (en)*2025-06-102025-09-05苏州元脑智能科技有限公司Network page memory management method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109522175A (en)*2017-09-182019-03-26华为技术有限公司 A method and device for memory evaluation
CN114883013A (en)*2022-06-082022-08-09中国工商银行股份有限公司Health state evaluation method and device and computer equipment
CN115186924A (en)*2022-07-282022-10-14网思科技股份有限公司Equipment health state evaluation method and device based on artificial intelligence
CN115793990A (en)*2023-02-062023-03-14天翼云科技有限公司Memory health state determination method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10310749B2 (en)*2016-09-162019-06-04Netscout Systems Texas, LlcSystem and method for predicting disk failure
CN116680112B (en)*2023-07-282023-11-03苏州浪潮智能科技有限公司Memory state detection method, device, communication equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109522175A (en)*2017-09-182019-03-26华为技术有限公司 A method and device for memory evaluation
CN114883013A (en)*2022-06-082022-08-09中国工商银行股份有限公司Health state evaluation method and device and computer equipment
CN115186924A (en)*2022-07-282022-10-14网思科技股份有限公司Equipment health state evaluation method and device based on artificial intelligence
CN115793990A (en)*2023-02-062023-03-14天翼云科技有限公司Memory health state determination method and device, electronic equipment and storage medium

Also Published As

Publication numberPublication date
WO2025025683A1 (en)2025-02-06
CN116680112A (en)2023-09-01

Similar Documents

PublicationPublication DateTitle
CN116680112B (en)Memory state detection method, device, communication equipment and storage medium
US9600394B2 (en)Stateful detection of anomalous events in virtual machines
US7769562B2 (en)Method and apparatus for detecting degradation in a remote storage device
CN110399238B (en) Disk failure early warning method, device, equipment and readable storage medium
US10248561B2 (en)Stateless detection of out-of-memory events in virtual machines
CN112088364B (en) Use a machine learning module to determine whether to perform error checking of a storage unit
US12260347B2 (en)Systems and methods for predicting storage device failure using machine learning
CN114443398B (en) Memory fault prediction model generation method, detection method, device and equipment
US20160371180A1 (en)Free memory trending for detecting out-of-memory events in virtual machines
CN111104238B (en)CE-based memory diagnosis method, device and medium
US20240231994A9 (en)Anomaly detection using metric time series and event sequences for medical decision making
CN116069606B (en)Software system performance fault prediction method and system
US20250238306A1 (en)Interactive data processing system failure management using hidden knowledge from predictive models
WO2021185182A1 (en)Anomaly detection method and apparatus
CN118838748A (en)Memory fault prediction method and device, storage medium and electronic equipment
CN114186031A (en)System fault prediction method, device, computer equipment and storage medium
CN119939277A (en) Equipment fault identification method, system and storage medium
CN118523963B (en) A method and system for assessing power information network security risks
CN119274834A (en) A method and system for diagnosing nuclear power plant equipment faults
US20250238303A1 (en)Interactive data processing system failure management using hidden knowledge from predictive models
US20250238307A1 (en)Interactive data processing system failure management using hidden knowledge from predictive models
CN118245348A (en) A memory fault prediction maintenance method, device, equipment and medium
US20250036971A1 (en)Managing data processing system failures using hidden knowledge from predictive models
CN112463523A (en)Memory bank health state monitoring method, device, equipment and storage medium
US12314120B2 (en)System and method for predicting system failure and time-to-failure based on attribution scores of log data

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CP03Change of name, title or address

Address after:215000 Building 9, No.1 guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Patentee after:Suzhou Yuannao Intelligent Technology Co.,Ltd.

Country or region after:China

Address before:215000 Building 9, No.1 guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Patentee before:SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before:China


[8]ページ先頭

©2009-2025 Movatter.jp