JP2006146668A

Movatterモバイル変換

Info

Publication number: JP2006146668A
Application number: JP2004337411A
Authority: JP
Inventors: Yosuke Miyamoto; 洋輔宮本; Takashi Horie; 高志保理江
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 2004-11-22
Filing date: 2004-11-22
Publication date: 2006-06-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide an operation management support apparatus capable of more flexibly detecting the performance deterioration state of a system and presenting the causes or countermeasures of deterioration in details. <P>SOLUTION: The operation management support apparatus is provided with: an operation information collection part 101 for collecting the operation information of respective elements at each prescribed time; an operation information DB 102 for storing the collected operation information; a preprocessing part 103 for finding out statistical analysis values among the stored operation information of respective elements; a model information DB 105 in which the statistical analysis values among the operation information of respective elements corresponding to a plurality of operation models are previously stored; and a decision part 106 for finding out an operation model corresponding to the collected operation information by comparing the statistical analysis values found by the preprocessing part 103 with the statistical analysis values stored in the model information DB 105. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

Translated fromJapanese

本発明は、ネットワークシステム等のシステムの運用管理を支援するために用いて好適な運用管理支援装置及び運用管理支援プログラムに関する。 The present invention relates to an operation management support apparatus and an operation management support program suitable for use in supporting operation management of a system such as a network system.

従来の運用管理では、監視対象の各サーバやネットワーク機器のハードウェア的な稼動情報や、各プログラムが動作しているかという情報を収集し、システムの構成要素ごとのアラートや稼動情報の提示を行っている。例えば、特許文献１に記載されている装置では、データベースに事前定義された一定範囲の条件を満たす運用状況が発生したときにアラートを発するようにしている。また、この装置では、収集した情報に基づいて性能劣化の到達予想を直線・曲線近似によって提示できるようになっている。ただし、この装置では、定義外の異常障害は基本的に考慮されていない。すなわち、監視対象の各構成要素の稼動情報に対して一定の値の範囲を予め定義し、その範囲に入るかまたは外れるかということを検出することで性能劣化状態が判定されるようになっていた。また、この装置では、検知した結果に基づいて予め定義されているものに関して、相関分析によって原因を絞り込み、対応する対策が表示されるようになっていた。 Conventional operation management collects hardware operation information of each monitored server and network device and information on whether each program is operating, and presents alerts and operation information for each system component. ing. For example, in the apparatus described in Patent Document 1, an alert is issued when an operation situation that satisfies a certain range of conditions predefined in a database occurs. Also, with this apparatus, it is possible to present the expected arrival of performance degradation based on the collected information by linear / curve approximation. However, this device basically does not consider abnormal faults that are not defined. That is, a range of a certain value is defined in advance for the operation information of each component to be monitored, and the performance deterioration state is determined by detecting whether the value falls within or outside the range. It was. Further, in this apparatus, the cause is narrowed down by correlation analysis for the ones that are predefined based on the detected result, and the corresponding countermeasures are displayed.

一方、特許文献２に記載されている装置では、正常動作からの外れ値を検知することで、性能劣化が判定されるようになっている。測定値から導かれる制御量（を基にしたデータ）と動作を行う操作量（を基にしたデータ）の相関を取り、検定を行うことで、性能劣化状態を検知している。ただし、各構成要素のデータは取得していないので、性能劣化の原因や対策を判断することはできなかった。
特開２００４−１４５５３６号公報特開２００３−２０８２１９号公報On the other hand, in the apparatus described inPatent Document 2, performance degradation is determined by detecting an outlier from normal operation. A performance deterioration state is detected by correlating a control amount (data based on the measurement value) derived from the measured value with an operation amount (data based on the operation) for performing an operation and performing a test. However, since the data of each component was not acquired, it was not possible to determine the cause of the performance degradation and the countermeasures.
JP 2004-145536 A JP 2003-208219 A

システムに障害が発生したときには、管理者がシステムの構成要素ごとのアラートや稼動情報から、システムの状況を推定・想像したり、あらかじめ定義したマニュアルに従って対応を行う。前者は、管理者のスキルに依存する部分が大きいため、障害、性能低下の改善に時間を要したり、人為的なミスによる２次被害の可能性があったりする。後者は条件の定義が厳密であるため、取得値が定義には含まれないが危険な兆候である場合などのケースでは、対応がとれなかったり、誤った対応を行ってしまったりするという問題点がある。 When a failure occurs in the system, the administrator estimates and imagines the system status from alerts and operation information for each component of the system and responds according to a predefined manual. Since the former largely depends on the skill of the manager, it takes time to improve failures and performance degradation, and there is a possibility of secondary damage due to human error. In the latter case, the definition of the condition is strict, so in cases such as when the acquired value is not included in the definition but it is a dangerous sign, it is not possible to take a response, or an incorrect response is made There is.

特許文献１に記載されている装置では、劣化の数値範囲・原因・対策が事前定義され、その範囲内において検出と対策を示すことができ、また、範囲外となる場合には、数値を直線・曲線で予測するのみで各構成要素の関連性などは考慮されていなかった。そのため、事前定義外の状況には対応しきれない場合が発生すると考えられる。 In the device described in Patent Document 1, the numerical range, cause, and countermeasure of deterioration are pre-defined, and detection and countermeasure can be shown within the range.・ Relationship between each component was not taken into account only by prediction with curves. For this reason, it may be impossible to deal with situations outside the predefined range.

また、特許文献２に記載の装置では、正常動作の状態を以前のデータから生成し、比較することで異常を検出する。すなわち“異常”の検出までを目的とするものあって、何処がおかしいかまでは、提示することができなかった。 Moreover, in the apparatus described inPatent Document 2, an abnormality is detected by generating a normal operation state from previous data and comparing it. In other words, it was aimed at detecting “abnormality” and could not be presented until something was wrong.

本発明は、上記の事情を考慮してなされたものであって、性能劣化状況の検知をより柔軟に行うことができ、また劣化の原因や対策を詳細に提示することができる運用管理支援装置を提供することを目的とする。 The present invention has been made in consideration of the above-described circumstances, and is capable of more flexibly detecting a performance deterioration state, and capable of presenting the cause and countermeasure of deterioration in detail. The purpose is to provide.

上記課題を解決するため、請求項１記載の発明は、３以上の要素の稼動情報を所定時間毎に収集する稼働情報収集手段と、稼働情報収集手段で収集した稼働情報を記憶する稼働情報記憶手段と、記憶した各要素の稼働情報間の統計的分析値を求める統計処理手段と、複数の稼動モデルに対応した各要素の稼働情報間の統計的分析値を予め記憶したモデル情報記憶手段と、統計処理手段で求めた統計的分析値をモデル情報記憶手段に記憶された統計的分析値と比較することで、収集した稼働情報に対応する稼動モデルを求める判断部とを備えることを特徴とする。 In order to solve the above-mentioned problem, the invention described in claim 1 is an operation information collection unit that collects operation information of three or more elements every predetermined time, and an operation information storage that stores operation information collected by the operation information collection unit. Means, statistical processing means for obtaining a statistical analysis value between operation information of each stored element, and model information storage means for storing in advance statistical analysis values between operation information of each element corresponding to a plurality of operation models A determination unit for obtaining an operation model corresponding to the collected operation information by comparing the statistical analysis value obtained by the statistical processing means with the statistical analysis value stored in the model information storage means, To do.

請求項２記載の発明は、前記判断部が、比較結果が一致した稼働モデルを求め、比較結果が一致しなかった場合には類似する稼働モデルを求めることを特徴とする。 The invention according toclaim 2 is characterized in that the determination unit obtains an operation model in which the comparison results coincide with each other, and obtains a similar operation model if the comparison results do not coincide with each other.

請求項３記載の発明は、前記判断部が、検定によって類似する稼働モデルを求め、検定によって類似する稼働モデルが求められない場合に比較結果が不一致となった統計的分析値の個数に基づいて類似する稼働モデルを求めることを特徴とする。 The invention according toclaim 3 is based on the number of statistical analysis values in which the determination unit obtains a similar operation model by the test and the comparison result is inconsistent when the similar operation model is not obtained by the test. It is characterized by obtaining a similar operation model.

請求項４記載の発明は、前記判断部によって稼働モデルが求められた場合に所定の通知先にその旨を通知する通知手段と、前記判断部によって稼働モデルが求められた場合に所定のアプリケーションを実行する対応アプリケーション連携手段とをさらに備えることを特徴とする。 According to a fourth aspect of the present invention, there is provided notification means for notifying a predetermined notification destination when an operation model is obtained by the determination unit, and a predetermined application when the operation model is obtained by the determination unit. And a corresponding application cooperation unit to be executed.

請求項５記載の発明は、前記対応アプリケーション連携手段が、求められたモデルが類似する稼働モデルである場合には所定のアプリケーションを実行しないことを特徴とする。 The invention according to claim 5 is characterized in that the corresponding application cooperation means does not execute a predetermined application when the obtained model is a similar operation model.

請求項６記載の発明は、前記統計的分析値が、相関係数であることを特徴とする。 The invention according to claim 6 is characterized in that the statistical analysis value is a correlation coefficient.

請求項７記載の発明は、前記統計的分析値が、主成分分析結果であることを特徴とする。 The invention according to claim 7 is characterized in that the statistical analysis value is a principal component analysis result.

請求項８記載の発明は、前記稼働情報が、ハードウェア稼動情報及びアプリケーション稼動情報の両者を含むことを特徴とする。 The invention according to claim 8 is characterized in that the operation information includes both hardware operation information and application operation information.

請求項９記載の発明は、前記モデル情報記憶手段に記憶された稼働モデルが、各要素を稼働させた状態で実際に収集された稼働情報に基づいて作成されたものであることを特徴とする。 The invention according to claim 9 is characterized in that the operation model stored in the model information storage means is created based on operation information actually collected in a state where each element is operated. .

請求項１０記載の発明は、３以上の要素の稼動情報を所定時間毎に収集する稼働情報収集段階と、稼働情報収集段階で収集した稼働情報を記憶する稼働情報記憶段階と、記憶した各要素の稼働情報間の統計的分析値を求める統計処理段階と、複数の稼動モデルに対応した各要素の稼働情報間の統計的分析値を予め記憶したモデル情報記憶手段を用いて、統計処理段階で求めた統計的分析値をモデル情報記憶手段に記憶された統計的分析値と比較することで、収集した稼働情報に対応する稼動モデルを求める判断段階とをコンピュータを用いて実行するための記述を含むことを特徴とする。 The invention according toclaim 10 is an operation information collection stage for collecting operation information of three or more elements every predetermined time, an operation information storage stage for storing operation information collected in the operation information collection stage, and each stored element In the statistical processing stage, using a statistical processing stage to obtain a statistical analysis value between the operation information of the model, and a model information storage means in which statistical analysis values between the operation information of each element corresponding to a plurality of operation models are stored in advance. A description for executing, using a computer, a determination stage for obtaining an operation model corresponding to the collected operation information by comparing the obtained statistical analysis value with the statistical analysis value stored in the model information storage means. It is characterized by including.

請求項１記載の発明では、各要素の稼働情報に対して一定の検出範囲を設定するのではなく、複数の要素の稼働情報間の統計的分析値（例えば相関係数、主成分分析結果等）と、予め定義した複数の稼働モデルに対応した統計的分析値とを比較することで、対応する稼働モデルを求めるようにしている。すなわち、統計的な稼働モデルを利用して通常の状況や障害の状況を定義し、各要素の稼働状態間の統計的分析値に基づき、いずれの稼働モデルに一致あるいは類似するかということを判断することで、通常の状況や性能劣化等の障害の状況を提示することができる。したがって、性能劣化状況の検知をより柔軟に行うことができる。また各要素の稼働情報を収集しているので、劣化の原因や対策を詳細に提示することができる。 According to the first aspect of the present invention, instead of setting a fixed detection range for the operation information of each element, a statistical analysis value (for example, correlation coefficient, principal component analysis result, etc.) between the operation information of a plurality of elements ) And statistical analysis values corresponding to a plurality of predefined operation models, a corresponding operation model is obtained. In other words, use the statistical operating model to define normal situations and failure situations, and determine which operating model matches or is similar based on statistical analysis values between the operating states of each element. By doing so, it is possible to present a normal situation or a failure situation such as performance degradation. Therefore, the performance degradation status can be detected more flexibly. In addition, since the operation information of each element is collected, the cause of the deterioration and the countermeasures can be presented in detail.

また、請求項２記載の発明によれば、前記判断部が、比較結果が一致した稼働モデルを求め、比較結果が一致しなかった場合には類似する稼働モデルを求めるので、性能劣化状況の検知をより柔軟に行うことができる。 According to the second aspect of the present invention, the determination unit obtains an operation model with a matching comparison result, and if a comparison result does not match, obtains a similar operation model. Can be done more flexibly.

請求項３記載の発明によれば、前記判断部が、検定によって類似する稼働モデルを求め、検定によって類似する稼働モデルが求められない場合に比較結果が不一致となった統計的分析値の個数に基づいて類似する稼働モデルを求めるので、類似モデルを求める処理の精度を向上させることができる。 According to a third aspect of the present invention, the determination unit obtains a similar operation model by the test, and if the similar operation model is not obtained by the test, the number of statistical analysis values for which the comparison result is inconsistent is obtained. Since a similar operation model is obtained based on this, the accuracy of processing for obtaining a similar model can be improved.

請求項４記載の発明によれば、前記判断部によって稼働モデルが求められた場合に所定の通知先にその旨を通知する通知手段と、前記判断部によって稼働モデルが求められた場合に所定のアプリケーションを実行する対応アプリケーション連携手段とをさらに備えたので、所定の対応や通知を自動的に行うことができる。 According to a fourth aspect of the present invention, a notification means for notifying a predetermined notification destination when an operating model is obtained by the determining unit, and a predetermined unit when the operating model is obtained by the determining unit. Since a corresponding application cooperation unit for executing an application is further provided, predetermined correspondence and notification can be automatically performed.

請求項５記載の発明によれば、前記対応アプリケーション連携手段が、求められたモデルが類似する稼働モデルである場合には所定のアプリケーションを実行しないようにしたので、予期しない稼働状態によって誤った類似モデルが検知されたような場合に誤った対応策が自動的に実施されないようにすることができる。 According to the fifth aspect of the present invention, the corresponding application cooperation unit does not execute the predetermined application when the obtained model is a similar operation model, and therefore, the similar similarity that is erroneous due to an unexpected operation state. It is possible to prevent an erroneous countermeasure from being automatically implemented when a model is detected.

請求項６記載の発明によれば、前記統計的分析値が相関係数なので、各要素の２個の稼働情報間に稼働状況に応じて相関が十分認められる場合に、稼働状況の識別精度を向上させることができる。 According to the sixth aspect of the present invention, since the statistical analysis value is a correlation coefficient, when the correlation between the two pieces of operation information of each element is sufficiently recognized according to the operation status, the identification accuracy of the operation status is increased. Can be improved.

請求項７記載の発明によれば、前記統計的分析値は主成分分析結果あり、相関係数行列が２変数の相関を行列にしたものであるのに対し主成分分析はｎ変数の場合にはｎ個の特性より定まるため、主成分分析結果に特徴が十分認められる場合に、稼働状況の識別精度を向上させることができる。 According to the seventh aspect of the present invention, the statistical analysis value is a result of principal component analysis, and the correlation coefficient matrix is a matrix of correlations of two variables, whereas the principal component analysis is n variables. Since n is determined from n characteristics, when the characteristics are sufficiently recognized in the principal component analysis result, it is possible to improve the identification accuracy of the operation status.

請求項８記載の発明によれば、前記稼働情報が、ハードウェア稼動情報及びアプリケーション稼動情報の両者を含むので、より詳細に原因を特定することができる。 According to the invention described in claim 8, since the operation information includes both hardware operation information and application operation information, the cause can be specified in more detail.

請求項９記載の発明によれば、前記モデル情報記憶手段に記憶された稼働モデルが、各要素を稼働させた状態で実際に収集された稼働情報に基づいて作成されたものであるので、基準とする稼働モデルを実システムにより一致したものにしやすくなる。 According to the ninth aspect of the present invention, the operation model stored in the model information storage means is created based on the operation information actually collected in a state where each element is operated. It becomes easy to make the operation model to be more consistent with the actual system.

以下、図面を参照して本発明による運用管理支援装置の実施の形態について説明する。図１は、本発明による運用管理支援装置の実施の形態の構成を示すシステム図である。図１では、インターネット等のネットワーク１に対して通信回線２を介して監視対象システム１０が接続され、さらに監視対象システム１０に対して本実施の形態の運用監視装置としての管理サーバ１００が接続されている。管理サーバ１００は、監視対象システム１０の稼働状態を監視し、ハードウェア的な稼動情報や、各プログラムが動作しているかという稼働情報を収集し、自装置の表示装置１００ａやクライアント端末２００の表示装置２００ａを用いて、システムの構成要素ごとのアラートや稼動情報の提示を行う。 Embodiments of an operation management support apparatus according to the present invention will be described below with reference to the drawings. FIG. 1 is a system diagram showing a configuration of an embodiment of an operation management support apparatus according to the present invention. In FIG. 1, amonitoring target system 10 is connected to a network 1 such as the Internet via acommunication line 2, and amanagement server 100 as an operation monitoring apparatus of the present embodiment is further connected to themonitoring target system 10. ing. Themanagement server 100 monitors the operating state of themonitoring target system 10, collects hardware-related operating information and operating information indicating whether each program is operating, and displays the information on thedisplay device 100a of the own device and theclient terminal 200. Thedevice 200a is used to present alerts and operational information for each component of the system.

監視対象システム１０は、ネットワーク１に対して各種通信サービスを提供するものであって、ファイアウォール１１、各装置間を接続するネットワーク１２、ＮＩＤＳ（ＮｅｔｗｏｒｋＩｎｔｒｕｓｉｏｎＤｅｔｅｃｔｉｏｎＳｙｓｔｅｍ；ネットワーク型不正侵入検知システム）１３、Ｗｅｂサーバ１４、ＡＰ（ＡＰｐｌｉｃａｔｉｏｎ）サーバ１５、およびＤＢ（ＤａｔａＢａｓｅ）サーバ１６から構成されている。ファイアウォール１１は、ネットワーク１２に対するネットワーク１からの不正なアクセスを制限する。ＮＩＤＳ１３は、ネットワーク１２への侵入を検知するシステムであり、ファイアウォール１１が発見できない攻撃を認識することができる。Ｗｅｂサーバ１４は、ネットワーク１を介してＷｅｂブラウザ等で閲覧されるコンテンツを提供するためのコンピュータである。ＡＰサーバ１５は、ネットワーク１を介したＷｅｂブラウザ等からの要求に応じて所定のアプリケーションプログラムを実行するためのコンピュータであり、例えばＷｅｂサーバ１４とＤＢサーバ１６の中間に位置して所定の処理を行う。そして、ＤＢサーバ１６は、複数のデータ、ファイル等の情報を蓄積、整理し、ネットワーク１を介したＷｅｂブラウザ等からの要求に応じて所定の情報を提供するためのコンピュータである。 Themonitoring target system 10 provides various communication services to the network 1, and includes a firewall 11, anetwork 12 that connects each device, a NIDS (Network Intrusion Detection System) 13, AWeb server 14, an AP (Application)server 15, and a DB (DataBase)server 16 are included. The firewall 11 restricts unauthorized access to thenetwork 12 from the network 1. TheNIDS 13 is a system that detects intrusion into thenetwork 12 and can recognize attacks that the firewall 11 cannot find. Theweb server 14 is a computer for providing content that is browsed by a web browser or the like via the network 1. TheAP server 15 is a computer for executing a predetermined application program in response to a request from a Web browser or the like via the network 1. For example, theAP server 15 is located between theWeb server 14 and theDB server 16 and performs predetermined processing. Do. TheDB server 16 is a computer that accumulates and organizes information such as a plurality of data and files, and provides predetermined information in response to a request from a Web browser or the like via the network 1.

監視対象システム１０を構成する各装置１１、１３〜１６は、それぞれ、本実施の形態における監視対象の構成要素の１または複数を有するものである。各装置は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）の使用率、ＤＢクエリ数（検索数）、アクセス数、エラーログ記録数、アクセスログ記録数、送出パケット数、ＮＩＤＳアラート数等の稼働情報の１または複数を、管理サーバ１００の要求に応じて、あるいは自発的に、管理サーバ１００に対して送信する。これらの稼働情報は、ネットワーク１２を介して、ＩＣＭＰ（ＩｎｔｅｒｎｅｔＣｏｎｔｒｏｌＭｅｓｓａｇｅＰｒｏｔｏｃｏｌ）、ＳＮＭＰ（ＳｉｍｐｌｅＮｅｔｗｏｒｋＭａｎａｇｅｍｅｎｔＰｒｏｔｏｃｏｌ）等のプロトコルや、ｒｓｈ（ｒｅｍｏｔｅｓｈｅｌｌ）を用いて転送される。 Each of thedevices 11 and 13 to 16 configuring themonitoring target system 10 has one or more of the monitoring target components in the present embodiment. Each device has one or more operating information such as a CPU (Central Processing Unit) usage rate, number of DB queries (number of searches), number of accesses, number of error log records, number of access log records, number of transmitted packets, number of NIDS alerts, etc. Is transmitted to themanagement server 100 in response to a request from themanagement server 100 or voluntarily. The operation information is transferred via thenetwork 12 using protocols such as ICMP (Internet Control Message Protocol) and SNMP (Simple Network Management Protocol), and rsh (remote shell).

管理サーバ１００は、所定のＯＳ（Operating System）上でそのハードウェア資源を利用して所定のプログラム実行することで運用監視装置としての機能を提供するコンピュータである。図１では、管理サーバ１００内にソフトウェアとハードウェアと一方または組み合わせから構成されている各機能をブロックに分けて示している。この場合、管理サーバ１００は、稼働情報収集部１０１、稼働情報ＤＢ１０２、前処理部１０３、モデル生成部１０４、モデル情報ＤＢ１０５、判断部１０６、ユーザインターフェース１０７、アラート通知部１０８、対応情報ＤＢ１０９、および対応アプリケーション連携部１１０を備えている。 Themanagement server 100 is a computer that provides a function as an operation monitoring apparatus by executing a predetermined program using a hardware resource on a predetermined OS (Operating System). In FIG. 1, each function configured by one or a combination of software and hardware in themanagement server 100 is divided into blocks. In this case, themanagement server 100 includes an operationinformation collection unit 101, anoperation information DB 102, apreprocessing unit 103, amodel generation unit 104, amodel information DB 105, adetermination unit 106, a user interface 107, analert notification unit 108, acorrespondence information DB 109, and A correspondingapplication cooperation unit 110 is provided.

稼働情報収集部１０１は、監視対象システム１０内の複数の装置１１、１３〜１６（３個以上の構成要素）から、ＩＣＭＰ、ＳＮＭＰ、ｒｓｈなどを利用して、ＣＰＵ、ＮｅｔｗｏｒｋＩＯ（ネットワークＩｎｐｕｔ／Ｏｕｔｐｕｔ）などのハードウェア稼動情報、Ｗｅｂサーバのアクセス量、ＤＢサーバの処理クエリ量などのアプリケーション稼動情報を一定時間間隔で取得し、稼動情報ＤＢ１０２に保存する。稼働情報ＤＢ１０２は、稼動情報収集部１０１が取得したデータ（稼働情報）をタイムスタンプをキーとして保存する。 The operationinformation collection unit 101 uses a CPU, a network IO (network input / network IO) from a plurality ofdevices 11 and 13 to 16 (three or more components) in themonitoring target system 10 using ICMP, SNMP, rsh, and the like. The hardware operation information such as “Output”, the access amount of the Web server, and the application operation information such as the processing query amount of the DB server are acquired at regular time intervals and stored in theoperation information DB 102. Theoperation information DB 102 stores data (operation information) acquired by the operationinformation collection unit 101 using a time stamp as a key.

図２は稼働情報ＤＢ１０２に格納されるデータの一例を示す図である。図２に示す例では、それぞれ１分間隔で収集した、サーバ１（例えばＷｅｂサーバ１４）のＣＰＵ使用率、サーバ２（例えばＡＰサーバ１５）のＣＰＵ使用率、ＤＢクエリ数（ＤＢサーバ１６における検索数に対応）、アクセス処理数（例えばＷｅｂサーバ１４の所定の機能による処理数）が記録されている。 FIG. 2 is a diagram illustrating an example of data stored in theoperation information DB 102. In the example illustrated in FIG. 2, the CPU usage rate of the server 1 (for example, the Web server 14), the CPU usage rate of the server 2 (for example, the AP server 15), and the number of DB queries (searches in the DB server 16) collected at 1-minute intervals. And the number of access processes (for example, the number of processes by a predetermined function of the Web server 14) is recorded.

前処理部１０３は、モデル生成部１０４または判断部１０６で使用するデータの前処理（統計処理）を行う。前処理部１０３は、稼働情報ＤＢ１０２に格納されている各構成要素の稼働情報間の統計的分析値を求める統計処理を行う。前処理部１０３は、例えば、稼働情報ＤＢ１０２に格納されている各装置１１、１２〜１６から取得した稼働情報に対して、各稼働情報間の相関係数を求めたり、各稼働情報間で主成分分析を行ったりして、統計的分析値を求める。この統計的分析値は、所定時刻における各装置の稼働情報間の関連度を示す値となっている。図３に前処理部１０３の処理結果の一例を示す。 Thepreprocessing unit 103 performs preprocessing (statistical processing) of data used by themodel generation unit 104 or thedetermination unit 106. Thepreprocessing unit 103 performs a statistical process for obtaining a statistical analysis value between the operation information of each component stored in theoperation information DB 102. For example, thepreprocessing unit 103 obtains a correlation coefficient between the pieces of operation information with respect to the pieces of operation information acquired from theapparatuses 11 and 12 to 16 stored in theoperation information DB 102, Statistical analysis values are obtained by component analysis. This statistical analysis value is a value indicating the degree of association between the operation information of each device at a predetermined time. FIG. 3 shows an example of the processing result of thepreprocessing unit 103.

図３は、図２に示す稼働情報に対して、相関係数を求める処理を行った結果を示すものであり、処理結果が相関係数行列として求められていることを示している。図３に示す例では、サーバ１のＣＰＵ使用率とサーバ２のＣＰＵ使用率との相関係数が「０．９３」、サーバ１のＣＰＵ使用率とＤＢクエリ数との相関係数が「０．７５」、サーバ２のＣＰＵ使用率とＤＢクエリ数との相関係数が「０．８９」等となっている。 FIG. 3 shows a result of processing for obtaining a correlation coefficient for the operation information shown in FIG. 2, and shows that the processing result is obtained as a correlation coefficient matrix. In the example shown in FIG. 3, the correlation coefficient between the CPU usage rate of the server 1 and the CPU usage rate of theserver 2 is “0.93”, and the correlation coefficient between the CPU usage rate of the server 1 and the number of DB queries is “0”. .75 ”, the correlation coefficient between the CPU usage rate of theserver 2 and the number of DB queries is“ 0.89 ”.

相関関数は、２次元のデータにおける変量間の関係をある１つ係数で表現する手法であり、相関係数は、２つの変量間の相関関係の程度を表す数値である。いま、ある集合（xi, yi）、xi∈X、yi∈Yという２次元データ集合が存在した場合、下式（１）で、相関係数R_XYは−1≦R_XY≦1の範囲で与えられる。The correlation function is a technique for expressing the relationship between variables in two-dimensional data with one coefficient, and the correlation coefficient is a numerical value indicating the degree of correlation between the two variables. Now, if there is a two-dimensional data set of a certain set (xi, yi), xi∈X, yi∈Y, the correlation coefficient R_XY is in the range of −1 ≦ R_XY ≦ 1 in the following equation (1). Given.

なお、相関係数の値（R_XYの絶対値｜R_XY｜）については一般的に次の特性が認められている。｜R_XY｜≦0.2…ほとんど相関がなく、0.2≦｜R_XY｜≦0.4…弱い相関があり、0.4≦｜R_XY｜≦0.7…中程度の相関があり、そして、0.7≦｜R_XY｜…高い相関がある。The value of the correlation coefficient (absolute value of R_{_XY} | R_XY |) generally following characteristics are observed for._{│R XY} │≤0.2 ... Almost no correlation, 0.2 ≤ | R_XY │≤0.4 ... weak correlation, 0.4 ≤ | R_XY │≤0.7 ... moderate correlation, and 0.7 ≤ | R_XY │ ... highly correlated.

図１のモデル生成部１０４は、本装置で基準（標準）となる複数の稼働モデルを定義してモデル情報ＤＢ１０５に登録したり、追加したりするための動作を行う。各稼働モデルは、意図的に障害を発生させるなどして所望の状態を作り出し、稼働情報収集部１０１で各構成要素から稼働情報を実際に収集することで作成することができる。ここで、各稼働モデルは、正常状態、正常状態で比較的アクセスが増大した状態、ＡＰアプリ（アプリケーション）異常負荷時状態、ＡＰサーバ障害状態、エラーアクセス増大、ハードディスク・エラー多発状態等の稼働状態を所定の形式で表したものであり、各構成要素の稼働情報間の統計的分析値を用いて定義される。図４および図５にモデル情報ＤＢ１０５の登録情報（モデル情報テーブル）の一例を示した。 Themodel generation unit 104 in FIG. 1 performs an operation for defining and registering a plurality of operation models serving as a reference (standard) in the apparatus and registering them in themodel information DB 105. Each operation model can be created by creating a desired state by intentionally generating a failure and actually collecting operation information from each component by the operationinformation collection unit 101. Here, each operation model has a normal state, a state in which access is relatively increased in the normal state, an AP application (application) abnormal load state, an AP server failure state, an error access increase, a hard disk error frequent occurrence state, etc. Is expressed in a predetermined format, and is defined using a statistical analysis value between operation information of each component. 4 and 5 show an example of registration information (model information table) in themodel information DB 105. FIG.

図４に示す例では、複数の稼働モデルに対応させて（この場合、ＡＰアプリ異常負荷、エラーアクセス増大の２つの稼働モデルに対応させて）、モデルＩＤ（Identification）、対応ＩＤ、通知ＩＤ、モデルデータによってモデル情報テーブルの各レコードが構成されている。モデルＩＤは各稼働モデルの識別情報であり、対応ＩＤはその状態が発生した場合に取るべき対応を特定する情報であり、通知ＩＤはその状態が発生した場合に通知すべきアラート通知先等の内容を特定する情報であり、モデルデータはその状態で取得された構成要素の稼働情報間の統計的分析値である。対応ＩＤと通知ＩＤに対応する具体的内容は、図１の対応情報ＤＢ１０９に記憶されている。モデルデータは、例えば相関係数を統計的分析値として用いる場合には、図５に示すような各構成要素の稼働情報間の各相関係数からなる相関係数行列となる。この各モデルの定義データは、実際のデータと比較処理を行う際に比較基準値の中央値として用いられ、所定の幅を持たせた上で各データ間の比較が行われるようになっている。すなわち、中央値を基準として所定の幅を持たせた値が各モデルの定義データとなる。ただし、本実施の形態では、複数のモデル間の自他識別性を向上させるため、モデル情報ＤＢ１０５に登録される各定義データに対しては次のような中央値の変更処理を予め行うようにしている。 In the example shown in FIG. 4, corresponding to a plurality of operation models (in this case, corresponding to two operation models of AP application abnormal load and error access increase), model ID (Identification), correspondence ID, notification ID, Each record of the model information table is composed of model data. The model ID is identification information of each operation model, the correspondence ID is information for specifying a correspondence to be taken when the state occurs, and the notification ID is an alert notification destination to be notified when the state occurs. The model data is a statistical analysis value between the operation information of the constituent elements acquired in the state. Specific contents corresponding to the correspondence ID and the notification ID are stored in thecorrespondence information DB 109 of FIG. For example, when the correlation coefficient is used as a statistical analysis value, the model data is a correlation coefficient matrix including each correlation coefficient between the operation information of each component as illustrated in FIG. The definition data of each model is used as a median value of comparison reference values when performing comparison processing with actual data, and comparison between each data is performed with a predetermined width. . That is, a value having a predetermined width with the median as a reference is the definition data of each model. However, in the present embodiment, in order to improve the self-other identification among a plurality of models, the following median value changing process is performed in advance for each definition data registered in themodel information DB 105. ing.

図５は、モデル情報ＤＢ１０５に登録される相関係数行列の一例を示す図であり、図３に示す前処理部１０３で作成された相関係数行列に対して、モデル生成部１０４によって中央値の変更処理を行ったものである。中央値の変更処理は、先に登録された他のモデルと、相関係数行列の各要素が同一あるいは非常に近い値とならないようにするためのものである。図６を参照して、この中央値変更処理について説明する。 FIG. 5 is a diagram illustrating an example of the correlation coefficient matrix registered in themodel information DB 105. The median value is generated by themodel generation unit 104 with respect to the correlation coefficient matrix created by thepreprocessing unit 103 illustrated in FIG. The change process is performed. The median value changing process is for preventing each element of the correlation coefficient matrix from being the same or very close to that of the other models registered previously. The median value changing process will be described with reference to FIG.

図６は、モデル生成部１０４による中央値変更処理を含むモデル追加処理の流れを示すフローチャートである。入力データは、前処理が行われた中間データ（図３参照）、コメント、対応ＩＤ、通知ＩＤである（コメント、対応ＩＤ、通知ＩＤは、モデル作成者が与える。ただし、対応ＩＤ、通知ＩＤは少なくとも一方があればよい）。データの置き換え処理（ステップＳ１１）では、入力（インプット）されたデータを事前に定義されたレンジ幅の（○○〜○○）の中央値の形式に置き換える処理を行う。インプットのデータが、そのレンジに入っていれば変更可能であるとする。すなわち、インプットデータが０．６５であり、レンジ幅が０．２であれば、０．５５〜０．７５の範囲で中央値を設定できるとしている（図７参照）。ただし、レンジ幅はチューニングパラメータとして、モデル毎に変更可能とすることができる。初期値はインプットの値を使用する。 FIG. 6 is a flowchart showing a flow of model addition processing including median value change processing by themodel generation unit 104. The input data is preprocessed intermediate data (see FIG. 3), comment, corresponding ID, and notification ID (the comment, corresponding ID, and notification ID are given by the model creator. However, the corresponding ID and notification ID are provided. Need at least one). In the data replacement process (step S11), the input data is replaced with a median format of (XX to XX) having a predefined range width. Assume that the input data can be changed if it falls within that range. That is, if the input data is 0.65 and the range width is 0.2, the median can be set in the range of 0.55 to 0.75 (see FIG. 7). However, the range width can be changed for each model as a tuning parameter. The initial value uses the input value.

事前の限度回数Ｎを超えていない場合には（ステップＳ１２で「Ｎｏ」）、既存モデルとの比較を行う（ステップＳ１４）。既存モデルとの比較は、すでに存在するモデルとすべての要素にレンジの重複がある場合には両方のモデルの状況と判断されてしまうのでそのようなモデルを登録しないようにするためのチェック作業である。明確な差がある場合には（ステップＳ１５で「Ｙｅｓ」）、モデル情報ＤＢ１０５のすでに登録されているレコード分の回数、比較対象の既存モデルを変化させながらステップＳ１４の処理を繰り返し行う（ステップＳ１６〜ステップＳ１３〜ステップＳ１４〜ステップＳ１５〜ステップＳ１６〜…）。ステップＳ１５では、例えば図８に示すように、既存モデルのレンジと、今回のインプットデータのレンジとの重複部分（矢印太線）が所定の値以下であるかどうかを確認し、所定の値以下である場合に明確な差があると判定する。 When the limit number N in advance is not exceeded (“No” in step S12), comparison with the existing model is performed (step S14). The comparison with the existing model is a check to prevent the registration of such a model because it will be judged as the situation of both models if there is a range overlap between the existing model and all elements. is there. If there is a clear difference (“Yes” in step S15), the process of step S14 is repeated while changing the number of records already registered in themodel information DB 105 and the existing model to be compared (step S16). ~ Step S13 ~ Step S14 ~ Step S15 ~ Step S16 ~ ...). In step S15, for example, as shown in FIG. 8, it is confirmed whether or not the overlapping portion (arrow thick line) of the range of the existing model and the current input data range is equal to or less than a predetermined value. It is determined that there is a clear difference in some cases.

一方、明確な差が無い場合には（ステップＳ１５で「Ｎｏ」）、重複が少ない部分を走査する（ステップＳ１８）。ステップＳ１８では、差がないと判断された要素についてレンジ間の重なりが少ない部分をリストアップして、データの置き換え候補として、ステップＳ１１のデータの置き換え処理に渡される。ステップＳ１１では置き換え候補に基づいて中央値を変更する処理を行う。 On the other hand, if there is no clear difference (“No” in step S15), a portion with less overlap is scanned (step S18). In step S18, a part having a small overlap between ranges of elements determined to have no difference is listed and transferred to the data replacement process in step S11 as a data replacement candidate. In step S11, the median is changed based on the replacement candidate.

以上のようにして、すべての既存モデルと明確な差がある状態となったところで、今回のインプットデータがモデル情報ＤＢ１０５に追加される（ステップＳ１７）。また、コメント、対応ＩＤ、通知ＩＤもセットされる。ただし、ステップＳ１１のデータの置き換え処理が所定の限界回数Ｎを超えた場合には、処理不能として、エラー処理へと引き継がれる（ステップＳ１２で「Ｙｅｓ」）。 As described above, when there is a clear difference from all existing models, the current input data is added to the model information DB 105 (step S17). A comment, corresponding ID, and notification ID are also set. However, if the data replacement process in step S11 exceeds the predetermined limit number N, it is assumed that the process cannot be performed, and the error process is taken over (“Yes” in step S12).

以上の構成および処理によって、モデル情報ＤＢ１５に稼働モデルが定義される。次に、図１を参照して、定義した稼働モデルを利用した監視処理について説明する。 The operation model is defined in themodel information DB 15 by the above configuration and processing. Next, a monitoring process using the defined operation model will be described with reference to FIG.

図１の判断部１０６は、定義されている稼働モデルと未知のデータ（新たに収集した稼働情報）との判断を行う。この判断部１０６は、モデル生成部１０４と排他的に動作する。つまり、稼働情報収集部１０１によって収集され、稼働情報ＤＢ１０２に記憶された監視対象システム１０の稼働情報は、前処理部１０３によって統計処理された後、判断部１０６へ入力される。判断部１０６は、前処理部１０３で求めた統計的分析値を、モデル情報ＤＢ１０５に記憶された統計的分析値と比較することで、収集した稼働情報に対応する稼動モデルを求める判断を実行する。なお、本実施の形態では、まず、厳密に一致するモデルを探索し、厳密に一致するモデルが見つからない場合には、類似するモデルを探索する処理を行うようにしている。判断部１０６は、定期的に、あるいは障害のアラートや、提供サービスのレスポンス低下などをトリガとしてこれらの処理を行う。 Thedetermination unit 106 in FIG. 1 determines a defined operation model and unknown data (newly collected operation information). Thedetermination unit 106 operates exclusively with themodel generation unit 104. That is, the operation information of themonitoring target system 10 collected by the operationinformation collection unit 101 and stored in theoperation information DB 102 is statistically processed by thepreprocessing unit 103 and then input to thedetermination unit 106. Thedetermination unit 106 compares the statistical analysis value obtained by thepreprocessing unit 103 with the statistical analysis value stored in themodel information DB 105 to execute a determination for obtaining an operation model corresponding to the collected operation information. . In this embodiment, first, a model that exactly matches is searched, and if a model that exactly matches is not found, a process for searching for a similar model is performed. Thedetermination unit 106 performs these processes periodically or triggered by a failure alert or a decrease in response of the provided service.

図９を参照して判断部１０６による処理について説明する。入力データは、例えば統計的分析値が相関係数であるとすれば図３に示すような前処理が行われた中間データとなる。まず、新たな処理の開始に際してレンジ外の要素の個数を記憶する記憶領域（ステップＳ２５参照）が初期化される。ステップＳ２２では、既存モデルとの比較処理が行われる。既存モデルとの比較処理では、前処理が行われた中間データの各要素（相関係数行列の各要素）がモデルのレンジに入るかが計算される。例えば今回の比較対象データと、あるモデル（今回選択中のモデル情報ＤＢ１０５のレコードに含まれる各要素のレンジ）を比較した結果が、図１０に示すようになった場合（太線で示すレンジ内に今回の要素が含まれる場合）、この要素についてはOK（レンジに入っている）とされる。これを要素の個数回繰り返し行い、中間データの全要素について既存モデルの対応する要素との比較を行う。 Processing performed by thedetermination unit 106 will be described with reference to FIG. For example, if the statistical analysis value is a correlation coefficient, the input data is intermediate data that has been preprocessed as shown in FIG. First, a storage area (see step S25) that stores the number of elements outside the range at the start of a new process is initialized. In step S22, a comparison process with the existing model is performed. In the comparison process with the existing model, it is calculated whether each element of the intermediate data subjected to the preprocessing (each element of the correlation coefficient matrix) falls within the model range. For example, when the comparison result data of this time and a certain model (range of each element included in the record of themodel information DB 105 currently selected) are as shown in FIG. 10 (within the range indicated by the bold line) If this element is included), this element is OK (in range). This is repeated for the number of elements, and all elements of the intermediate data are compared with corresponding elements of the existing model.

次に、すべての要素がそのモデルのレンジに入っているかを判定し（ステップＳ２３）、入っている場合には（ステップＳ２３で「Ｙｅｓ」）、現在の稼働状態がそのモデルの状態であると推定する。そして、コメント、対応ＩＤ、通知ＩＤを、図１のユーザインターフェース１０７、アラート通知部１０８、対応アプリケーション連携部１１０に出力し、終了する。 Next, it is determined whether or not all the elements are within the range of the model (step S23). If they are included (“Yes” in step S23), the current operating state is the state of the model. presume. Then, the comment, the corresponding ID, and the notification ID are output to the user interface 107, thealert notification unit 108, and the correspondingapplication cooperation unit 110 in FIG.

一方、レンジから外れる要素があった場合には（ステップＳ２３で「Ｎｏ」）、外れた個数が他のモデルに対して記録されているものより少ないかを判定し、少ない場合には（ステップＳ２４で「Ｙｅｓ」）、外れた個数とモデルＩＤを所定の記憶領域に記録する（ステップＳ２５）。ステップＳ２２以降の処理を、モデル情報ＤＢ１０５に定義されている各レコード（各モデル）に対して実行する。その際、すべての要素がレンジに入るモデルがあれば処理を終了するが（ステップＳ２３で「Ｙｅｓ」の場合）、すべての要素がレンジに入るモデルが無い場合には、記録している外れの個数が最も少ないモデルのＩＤと個数を引数として、類似マッチングフローへと処理を進める（ステップＳ２７）。 On the other hand, if there is an element that is out of range (“No” in step S23), it is determined whether the number of outliers is smaller than those recorded for other models, and if it is less (step S24). And "Yes"), the number and model ID that have been removed are recorded in a predetermined storage area (step S25). The processing after step S22 is executed for each record (each model) defined in themodel information DB 105. At this time, if there is a model in which all the elements are in the range, the process is terminated (in the case of “Yes” in step S23), but if there is no model in which all the elements are in the range, the recorded deviation is recorded. The process proceeds to the similar matching flow using the ID and the number of the model with the smallest number as arguments (step S27).

図１１は、類似マッチングフローの一例を示すフローチャートである。入力データは、前処理が行われた中間データ（図３参照）と、図１０のステップＳ２５で記録された外れの個数が最も少ないモデルＩＤである。この場合、モデルの再探索は行われず、入力されたモデルＩＤで指定されるモデルが類似するモデルとして出力される。ただし、その際、再度比較を行い（ステップＳ３１）、外れた要素をリストアップして（ステップＳ３２）、類似であることのフラグ、コメント、通知ＩＤを、ユーザインターフェース１０７およびアラート通知部１０８に出力し、処理を終了する。ただし、一致ではなく類似であるので、対応アプリケーション連携は行わない。 FIG. 11 is a flowchart illustrating an example of a similar matching flow. The input data is the intermediate data (see FIG. 3) on which preprocessing has been performed and the model ID with the smallest number of deviations recorded in step S25 in FIG. In this case, the model is not searched again, and the model specified by the input model ID is output as a similar model. However, at that time, the comparison is performed again (step S31), the removed elements are listed (step S32), and the similarity flag, comment, and notification ID are output to the user interface 107 and thealert notification unit 108. Then, the process ends. However, since it is not coincidence but similar, corresponding application cooperation is not performed.

上記で図１０および図１１を参照して説明した判断部フローの流れでは、図１２に示すように、基本的には厳密なマッチング（ステップＳ４１）を行い、厳密なマッチングができない場合に回数による類似マッチング（ステップＳ４２）を行うこととしている。これに対し、検定を用いる類似マッチングを追加して、精度向上を図ることも可能である。すなわち、図１３に示すように、まず、厳密なマッチングを行い（ステップＳ５１）、厳密なマッチングができない場合に検定を用いるマッチングを行い（ステップＳ５２）、そして、検定を用いるマッチングで所定条件を満たすモデルが得られない場合に回数による類似マッチングを行うようにする（ステップＳ５３）ことも可能である。 In the flow of the determination unit flow described above with reference to FIGS. 10 and 11, basically, as shown in FIG. 12, strict matching is basically performed (step S <b> 41). Similar matching (step S42) is performed. On the other hand, it is possible to improve accuracy by adding similar matching using a test. That is, as shown in FIG. 13, first, exact matching is performed (step S51). When exact matching is not possible, matching using a test is performed (step S52), and the predetermined condition is satisfied by matching using the test. When a model cannot be obtained, it is possible to perform similar matching based on the number of times (step S53).

図１４は、図１の判断部１０６で検定によるマッチング（類似マッチング追加例）を行う場合の処理の一例を示すフローチャートである。この場合、前提として、モデル情報ＤＢ１０５のモデルデータには、相関係数行列に加え、分散共分散行列とサンプル数を予め定義しておく。分散共分散行列の定義の仕方は、上述した相関係数行列のものと同様である。また、前処理部１０３からは、収集した稼働情報から得られる相関係数行列のほか分散共分散行列とサンプル数が入力される。外れの個数とモデルＩＤの入力は必要としない。 FIG. 14 is a flowchart illustrating an example of processing in the case where matching is performed by testing (similar matching addition example) in thedetermination unit 106 of FIG. In this case, as a premise, in the model data of themodel information DB 105, in addition to the correlation coefficient matrix, a variance covariance matrix and the number of samples are defined in advance. The method of defining the variance-covariance matrix is the same as that of the correlation coefficient matrix described above. In addition to the correlation coefficient matrix obtained from the collected operation information, the variance / covariance matrix and the number of samples are input from thepreprocessing unit 103. It is not necessary to input the number of deviations and the model ID.

図１４の類似判定フローでは、まず、モデル情報ＤＢ１０５に定義されている既存モデルの分散共分散行列と、入力された分散共分散行列とを用いて検定が行われる（ステップＳ６２）。そして、検定の結果得られた検定統計量を所定の有意水準と比較して、検定が採択される場合には（ステップＳ６３で「Ｎｏ」）、そのモデルを類似モデルに決定する。一方、検定が棄却された場合には（ステップＳ６３で「Ｙｅｓ」）、モデルを変更し、検定を再度実行する（ステップＳ６４〜ステップＳ６１〜ステップＳ６２〜…）。モデル情報ＤＢ１０５のレコードの回数繰り返しても採択されるモデルが見つからない場合（すべてのモデルが棄却された場合）には、図１１を参照して説明した回数による類似マッチングの処理へ移行させる（ステップＳ６５）。 In the similarity determination flow of FIG. 14, first, a test is performed using the variance-covariance matrix of the existing model defined in themodel information DB 105 and the input variance-covariance matrix (step S62). Then, the test statistic obtained as a result of the test is compared with a predetermined significance level, and when the test is adopted (“No” in step S63), the model is determined to be a similar model. On the other hand, when the test is rejected (“Yes” in step S63), the model is changed and the test is executed again (step S64 to step S61 to step S62 to...). If a model to be adopted is not found even if the number of records in themodel information DB 105 is repeated (when all models are rejected), the process proceeds to the similar matching process by the number of times described with reference to FIG. S65).

なお、ステップＳ６２およびステップＳ６３における検定処理は、次のようにして行うことができる。ここで、入力されるＰ個の稼動情報から求められるＰ×Ｐ行列である分散共分散行列とサンプル数をＳ，Ｎ、モデルの分散共分散行列とサンプル散をＳ’，Ｎ’とする。この時の差の検定は次のようにして行うことができる。 In addition, the verification process in step S62 and step S63 can be performed as follows. Here, it is assumed that the variance-covariance matrix and the number of samples, which are P × P matrices obtained from the input P pieces of operation information, are S and N, and the model variance-covariance matrix and sample variance are S ′ and N ′. The test of the difference at this time can be performed as follows.

１．分散共分散行列とサンプル数から平方和積和行列の行列式を下式（２）にて求める。1. The determinant of the sum-of-squares product-sum matrix is obtained by the following equation (2) from the variance-covariance matrix and the number of samples.

２．モデルと入力をあわせた形で平方和穏和行列の行列式を下式（３）にて求める。2. The determinant of the sum of squares and the sum of the model and input is obtained by the following equation (3).

３．ウィルクスのΛ統計量を下式（４）にて求める。3. Wilkes' Λ statistic is calculated by the following equation (4).

４．有意水準αを定めて検定統計量Ｆ₀を下式（５）にて比較して検定とする。ここで、Ｆ_{（Ｐ，Ｎ＋Ｎ’−Ｐ−１）}（α）は自由度（Ｐ，Ｎ＋Ｎ’−Ｐ−１）のαパーセント点を意味する。4). A significance level α is determined and the test statistic F₀ is compared by the following equation (5) to obtain a test. Here, F_{(P, N + N′−P−1)} (α) means the α percentage point of the degree of freedom (P, N + N′−P−1).

以上のようにして判断部１０６で一致または類似する稼働モデルが求められると、その結果は以下のようにして提示されたり、利用されたりする。すなわち図１のユーザインターフェース１０７は、判断部１０６から提供される情報に基づいて、表示装置１００ａあるいは表示装置２００ａを用い、Ｗｅｂブラウザや専用ビュアーのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）によって管理者（操作者）に情報を提示する機能を提供する。 When thedetermination unit 106 obtains a matching or similar operation model as described above, the result is presented or used as follows. That is, the user interface 107 in FIG. 1 uses thedisplay device 100a or thedisplay device 200a based on information provided from thedetermination unit 106, and is an administrator (operator) using a Web browser or a dedicated viewer GUI (Graphical User Interface). Provides a function to present information to

アラート通知部１０８は、判断部１０６から提供される情報と対応情報ＤＢ１０９に格納されている情報とに基づいて、管理者に対しては電子メールやページャ、本装置の上位に位置する統合運用管理ソフトウェアに対してはネットワークソケットなどを利用して、状態等を通知する。 Based on the information provided from thedetermination unit 106 and the information stored in thecorrespondence information DB 109, thealert notification unit 108 provides the administrator with an e-mail, a pager, and an integrated operation management located above the apparatus. The status is notified to the software using a network socket or the like.

対応アプリケーション連携部１１０は、一致する稼働モデルが求められた場合には、判断部１０６から提供される情報と対応情報ＤＢ１０９に格納されている情報とに基づいて、所定の稼働状態を判断したときにそれをトリガとして動作するアプリケーションに対してＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍＩｎｔｅｒｆａｃｅ）を利用して指示を行う。 When a corresponding operation model is obtained, the correspondingapplication cooperation unit 110 determines a predetermined operating state based on information provided from thedetermination unit 106 and information stored in thecorresponding information DB 109. In addition, an application program API (Application Program Interface) is used to instruct an application that operates as a trigger.

対応情報ＤＢ１０９は、アラート送信や、対応アプリケーションの情報を格納する。格納する情報は、事前に定義しておく。図１５に一例を示した。図１５に示す例において対応情報ＤＢ１０９には、対応動作テーブル１０９ａとアラート通知テーブル１０９ｂとが記憶されている。対応動作テーブル１０９ａには、対応ＩＤに対応づけて連携可能アプリケーションや動作の定義がなされている。アラート通知テーブル１０９ｂには、通知ＩＤに対応づけて管理者や他の運用管理、セキュリティ管理ツールへの通知内容の定義がなされている。これらの対応ＩＤと通知ＩＤは、図４を参照して説明したモデル情報ＤＢ１０５に定義されている対応ＩＤと通知ＩＤにそれぞれ対応している。 Thecorrespondence information DB 109 stores alert transmission and information on the corresponding application. The information to be stored is defined in advance. An example is shown in FIG. In the example shown in FIG. 15, thecorrespondence information DB 109 stores a correspondence operation table 109a and an alert notification table 109b. In the corresponding operation table 109a, a collaborative application and operation are defined in association with the corresponding ID. In the alert notification table 109b, the contents of notification to the administrator and other operation management and security management tools are defined in association with the notification ID. These correspondence ID and notification ID correspond to the correspondence ID and notification ID defined in themodel information DB 105 described with reference to FIG.

例えば現在の稼働情報が、図４のモデル情報ＤＢ１０５（モデル情報テーブル）のモデルＩＤ「００１」の稼働モデル（「ＡＰアプリ異常負荷」、対応ＩＤ「００３」、通知ＩＤ「００２」）に一致すると判断された場合には、アラート通知部１０８は、アラート通知テーブル１０９ｂ内の通知ＩＤ「００２」に設定されたあて先の電子メール（形式「××＠○○」）に状態等の通知を行う。一方、対応アプリケーション連携部１１０は、対応動作テーブル１０９ａ内の対応ＩＤ「００３」に設定された動作（「ＡＰサーバ追加」）を行うための所定の処理を行う。 For example, when the current operation information matches the operation model (“AP application abnormal load”, correspondence ID “003”, notification ID “002”) of the model ID “001” in the model information DB 105 (model information table) of FIG. If it is determined, thealert notification unit 108 notifies the destination e-mail (format “XX @ OO”) set in the notification ID “002” in the alert notification table 109b of the status and the like. On the other hand, the correspondingapplication cooperation unit 110 performs a predetermined process for performing the operation (“AP server addition”) set in the corresponding ID “003” in the corresponding operation table 109a.

以上に説明したように、本発明の実施の形態では、まず、監視対象となるサーバ、ネットワーク機器等から、ＣＰＵ使用率のようなハードウェア稼動情報、またＷｅｂサーバ（ＨＴＴＰ（HyperText Transfer Protocol）サーバ）であれば、アクセスの状況といったアプリケーションレベルの情報を定期的に取得するようにし、正常なアクセス時や、障害発生時といった各状況における稼動情報から、各状況を特徴付ける“取得値間の関連”を相関分析・主成分分析といった統計的手法を用いて算出し、各状況のモデルを定義してモデル情報ＤＢ１０５に保持しておく。 As described above, in the embodiment of the present invention, first, hardware operation information such as the CPU usage rate, a Web server (HTTP (HyperText Transfer Protocol) server) from a server or network device to be monitored. ), Application-level information such as access status is periodically acquired, and “relationship between acquired values” that characterizes each status from operation information in each status such as normal access or when a failure occurs Are calculated using a statistical method such as correlation analysis / principal component analysis, and a model for each situation is defined and stored in themodel information DB 105.

そして、運用時には、定期的に、あるいは障害のアラートや、提供サービスのレスポンス低下などをトリガとして、現在の稼働情報に対して定義したモデルと同様の統計的手法を行い、モデル情報ＤＢ１０５に定義したモデルと比較して、マッチしたモデルの状況を現在置かれている状況として識別する。これによって管理者はシステムの状況を容易に理解することが可能となり、個別コンポーネントを見ての対応ではなく、システムにより適した対応を取ることができる。また、モデル情報ＤＢ１０５に通常時の稼働モデルを定義しておくことで、いずれのモデルに対してもマッチしない場合、その状況は、通常の状況時が持つ取得値間の関連の一部が崩れたものであると考えることができる。つまり、対策が必要な異常状態ではないとしても、通常の状態から若干ずれているような状態であると考えることができる。そこで本実施の形態では、そのような場合にモデル情報ＤＢ１０５に保持しているモデルで最も類似性が高いものとその差分を提示することにした。これによって、管理者は取得値間の関連が崩れている部分に異常があるあるいは発生する可能性があることを推定することができる。 At the time of operation, a statistical method similar to the model defined for the current operation information is performed periodically or triggered by a failure alert or a decrease in response of the provided service, and is defined in themodel information DB 105. Compare the model to identify the situation of the matched model as the current situation. As a result, the administrator can easily understand the situation of the system, and can take a more suitable response to the system, not a response by looking at individual components. In addition, by defining the normal operation model in themodel information DB 105, if none of the models match, the situation is part of the relationship between the acquired values in the normal situation. Can be thought of as That is, even if it is not an abnormal state that requires countermeasures, it can be considered that the state is slightly deviated from the normal state. Therefore, in the present embodiment, in such a case, the model having the highest similarity in themodel information DB 105 and its difference are presented. Thereby, the administrator can estimate that there is an abnormality or a possibility that a portion where the relationship between the acquired values is broken is generated.

本実施形態によれば、従来に比べより詳細にシステムの状態を識別・提示できるので、管理者の状況判断の支援となり、システム全体を考慮に入れた形で迅速に的確な対応を取ることができ、人為ミスの低減が図れる。 According to the present embodiment, the system status can be identified and presented in more detail than in the past, so that it is possible to support the manager's situation judgment and to take an appropriate and prompt action in consideration of the entire system. This can reduce human error.

また統計的なモデルを利用して通常の状況、障害の状況を定義することで、取得値の揺らぎや障害に近い状況に対しても、管理者に異常があることを提示することができる。例えば、Ｗｅｂアプリケーションのシステムにおいて、そのレスポンスが低下した場合は、正規のリクエスト量が多い、正常でないリクエストにより性能が低下している、Ｗｅｂサーバ、ＡＰサーバ、ＤＢサーバ、ネットワークの何処かに処理が集中している、障害が発生しているといった状況と原因が考えられる。これらのそれぞれに対する対応は異なるが、本実施の形態によれば、これらの状況の判断と原因の切り分けを支援し、運用管理コストを低減させることが可能となる。 In addition, by defining a normal situation and a failure situation using a statistical model, it is possible to present an abnormality to the administrator even for a fluctuation of an acquired value or a situation close to a failure. For example, in a Web application system, if the response decreases, the processing is performed somewhere in the Web server, AP server, DB server, or network where the amount of regular requests is large or the performance is decreased due to an abnormal request. Possible situations and causes such as concentration and failure. Although the response to each of these is different, according to the present embodiment, it is possible to support the determination of the situation and the cause separation, and to reduce the operation management cost.

なお、稼働モデルとその対応との組み合わせには、上述したもののほか、次のようなものが考えられる。「正常なアクセスで、サーバ負荷が高い。」→サーバがダウンする前にサーバを増強する。「ワームと思われるアクセスが増加している。」→帯域を使い切る前に、該当ホストからのアクセスを遮断する。「アクセス量は多くないのにサーバ負荷が高い」→サービスを継続しながら該当サーバのプロセスの異変に対応する。 In addition to the above-described combinations of operation models and their correspondence, the following may be considered. "Normal access and high server load."-> Increase servers before the server goes down. “Access that seems to be worms is increasing.” → Block access from the host before using up the bandwidth. “The amount of access is not large but the server load is high.” → Respond to changes in the process of the server while continuing the service.

また、上記実施の形態において、統計的分析手法として主成分分析を用いる場合には次のような構成とすることができる。すなわち、一般的に主成分分析においては、主成分（固有値）を導出すると第１主成分に総合得点、第２主成分以降に特徴をよくあらわす主成分を得ることかできる。そこで、第２主成分以降の値により、評価することにより、比較を行うようにすれば、各稼働状態の識別をより明確に行うことができると考えられる。 Moreover, in the said embodiment, when using principal component analysis as a statistical analysis method, it can be set as the following structures. That is, in general, in the principal component analysis, if a principal component (eigenvalue) is derived, a total score can be obtained for the first principal component, and a principal component that well represents features after the second principal component can be obtained. Therefore, it is considered that each operation state can be identified more clearly if a comparison is made by evaluating the values after the second principal component.

また、本発明の実施の形態は、上記のものに限定されず、例えば、図１の各構成を統合したり、分割したり、ネットワークを介して分散して配置したり、他の構成を追加したりするような変更を適宜行うことができる。例えば図１の構成で、管理サーバ１００とネットワーク１２との間にファイアウォールを追加して、管理サーバ１００に対するパケットに一定の制限を加えることなどが可能である。また、稼働モデルの作成は、実際に収集した稼働情報に基づく場合に限らず、設計によって、あるいは類似のシステムにおける稼働モデルを利用して作成することも可能である。 In addition, the embodiment of the present invention is not limited to the above-described one. For example, the components shown in FIG. 1 are integrated, divided, distributed via a network, and other configurations are added. Changes can be made as appropriate. For example, in the configuration of FIG. 1, it is possible to add a firewall between themanagement server 100 and thenetwork 12 to add a certain restriction to packets to themanagement server 100. Moreover, the creation of the operation model is not limited to the case based on the actually collected operation information, and can be created by design or using an operation model in a similar system.

また、管理サーバ１００は、コンピュータやその周辺装置とそれらのハードウェア資源を用いて実行されるプログラムとによって構成することができるが、そのプログラムは、コンピュータ読み取り可能な記録媒体あるいは通信回線を介して配布することが可能である。 In addition, themanagement server 100 can be configured by a computer and its peripheral devices and a program that is executed using those hardware resources. The program is transmitted via a computer-readable recording medium or a communication line. It is possible to distribute.

本発明の運用管理支援システムの一実施の形態の構成を説明するためのブロック図。The block diagram for demonstrating the structure of one Embodiment of the operation management assistance system of this invention.図１の稼働情報ＤＢ１０２の一例を示す構成図。The block diagram which shows an example of the operation information DB102 of FIG.図１の前処理部１０３から出力される相関係数行列の一例を示す図。The figure which shows an example of the correlation coefficient matrix output from thepre-processing part 103 of FIG.図１のモデル情報ＤＢ１０５の一例を示す構成図。The block diagram which shows an example of model information DB105 of FIG.図４のモデルデータの一例を示す構成図。The block diagram which shows an example of the model data of FIG.図１の前処理部１０３の処理の一例を示すフローチャート。3 is a flowchart illustrating an example of processing of apreprocessing unit 103 in FIG. 1.図６の処理を説明するための説明図。Explanatory drawing for demonstrating the process of FIG.図６の処理を説明するための説明図。Explanatory drawing for demonstrating the process of FIG.図１のモデル生成部１０４の処理（厳密なマッチング）の一例を示すフローチャート。The flowchart which shows an example of the process (strict matching) of the model production |generation part 104 of FIG.図１０の処理を説明するための説明図。Explanatory drawing for demonstrating the process of FIG.図１のモデル生成部１０４の処理（回数による類似マッチング）の一例を示すフローチャート。The flowchart which shows an example of the process (similar matching by frequency | count) of the model production |generation part 104 of FIG.図１のモデル生成部１０４の処理（全体処理）の一例を示すフローチャート。The flowchart which shows an example of the process (whole process) of the model production |generation part 104 of FIG.図１のモデル生成部１０４の処理（全体処理）の他の例を示すフローチャート。7 is a flowchart showing another example of processing (overall processing) of themodel generation unit 104 in FIG. 1.図１のモデル生成部１０４の処理（検定による類似マッチング）の一例を示すフローチャート。The flowchart which shows an example of the process (similar matching by a test | inspection) of the model production |generation part 104 of FIG.図１の対応情報ＤＢ１０９の一例を示す構成図。The block diagram which shows an example of correspondence information DB109 of FIG.

符号の説明Explanation of symbols

１…ネットワーク、１０…監視対象システム、１１…ファイアウォール、１２…ネットワーク、１３…ＮＩＤＳ、１４…Ｗｅｂサーバ、１５…ＡＰサーバ、１６…ＤＢサーバ、１００…管理サーバ、１０１…稼働情報収集部、１０２…稼働情報ＤＢ、１０３…前処理部（統計処理）、１０４…モデル生成部、１０５…モデル情報ＤＢ、１０６…判断部、１０７…ユーザインターフェース、１０８…アラート通知部、１０９…対応情報ＤＢ、１１０…対応アプリケーション連携部、２００…クライアント
DESCRIPTION OF SYMBOLS 1 ... Network, 10 ... Monitored system, 11 ... Firewall, 12 ... Network, 13 ... NIDS, 14 ... Web server, 15 ... AP server, 16 ... DB server, 100 ... Management server, 101 ... Operation information collection part, 102Operation information DB 103 Pre-processing unit (statistical processing) 104Model generation unit 105Model information DB 106 Determination unit 107User interface 108Alert notification unit 109Corresponding information DB 110 ... Supported application cooperation unit, 200 ... Client

Claims

Translated fromJapanese

３以上の要素の稼動情報を所定時間毎に収集する稼働情報収集手段と、
稼働情報収集手段で収集した稼働情報を記憶する稼働情報記憶手段と、
記憶した各要素の稼働情報間の統計的分析値を求める統計処理手段と、
複数の稼動モデルに対応した各要素の稼働情報間の統計的分析値を予め記憶したモデル情報記憶手段と、
統計処理手段で求めた統計的分析値をモデル情報記憶手段に記憶された統計的分析値と比較することで、収集した稼働情報に対応する稼動モデルを求める判断部と
を備えることを特徴とする運用管理支援装置。Operation information collecting means for collecting operation information of three or more elements every predetermined time;
Operation information storage means for storing operation information collected by the operation information collection means;
Statistical processing means for obtaining a statistical analysis value between operation information of each stored element;
Model information storage means for storing in advance statistical analysis values between operation information of each element corresponding to a plurality of operation models;
A judgment unit for obtaining an operation model corresponding to the collected operation information by comparing the statistical analysis value obtained by the statistical processing means with the statistical analysis value stored in the model information storage means. Operation management support device.

前記判断部が、比較結果が一致した稼働モデルを求め、比較結果が一致しなかった場合には類似する稼働モデルを求める
ことを特徴とする請求項１記載の運用管理支援装置。The operation management support apparatus according to claim 1, wherein the determination unit obtains an operation model in which the comparison results match, and obtains a similar operation model if the comparison results do not match.

前記判断部が、検定によって類似する稼働モデルを求め、検定によって類似する稼働モデルが求められない場合に比較結果が不一致となった統計的分析値の個数に基づいて類似する稼働モデルを求める
ことを特徴とする請求項２記載の運用管理支援装置。The determination unit obtains a similar operation model based on the number of statistical analysis values in which the comparison results are inconsistent when a similar operation model is obtained by the test and a similar operation model is not obtained by the test. The operation management support apparatus according to claim 2, wherein:

前記判断部によって稼働モデルが求められた場合に所定の通知先にその旨を通知する通知手段と、
前記判断部によって稼働モデルが求められた場合に所定のアプリケーションを実行する対応アプリケーション連携手段と
をさらに備えることを特徴とする請求項１〜３のいずれか１項に記載の運用管理支援装置。A notification means for notifying a predetermined notification destination when an operation model is obtained by the determination unit;
The operation management support apparatus according to claim 1, further comprising: a corresponding application cooperation unit that executes a predetermined application when an operation model is obtained by the determination unit.

前記対応アプリケーション連携手段が、求められたモデルが類似する稼働モデルである場合には所定のアプリケーションを実行しない
ことを特徴とする請求項４記載の運用管理支援装置。The operation management support apparatus according to claim 4, wherein the corresponding application cooperation unit does not execute a predetermined application when the obtained model is a similar operation model.

前記統計的分析値が、相関係数であることを特徴とする請求項１〜５のいずれか１項に記載の運用管理支援装置。 The operation management support apparatus according to claim 1, wherein the statistical analysis value is a correlation coefficient.

前記統計的分析値が、主成分分析結果であることを特徴とする請求項１〜５のいずれか１項に記載の運用管理支援装置。 The operation management support apparatus according to claim 1, wherein the statistical analysis value is a principal component analysis result.

前記稼働情報が、ハードウェア稼動情報及びアプリケーション稼動情報の両者を含むことを特徴とする請求項１〜７のいずれか１項に記載の運用管理支援装置。 The operation management support apparatus according to claim 1, wherein the operation information includes both hardware operation information and application operation information.

前記モデル情報記憶手段に記憶された稼働モデルが、各要素を稼働させた状態で実際に収集された稼働情報に基づいて作成されたものであることを特徴とする請求項１〜８のいずれか１項に記載の運用管理支援装置。 9. The operation model stored in the model information storage means is created based on operation information actually collected in a state where each element is operated. The operation management support apparatus according to Item 1.