JP2019028879A

Movatterモバイル変換

Info

Publication number: JP2019028879A
Application number: JP2017149996A
Authority: JP
Inventors: 浩太笠原; Kota Kasahara
Original assignee: Ritsumeikan Trust
Current assignee: Ritsumeikan Trust
Priority date: 2017-08-02
Filing date: 2017-08-02
Publication date: 2019-02-21
Anticipated expiration: 2037-08-02
Also published as: JP7048065B2

Abstract

【課題】得られる予測精度が高く演算速度が向上した、結合性予測方法を提供する。【解決手段】標的の生体高分子の指定と、結合性予測対象の化合物の立体構造とを取得するステップＳ１１と、生体高分子の立体構造を蓄積した立体構造データベースから、指定に対応する生体高分子の立体構造を取得するステップＳ１２と、取得した立体構造に基づいて、生体高分子と化合物との複合体の予測立体構造を生成するステップＳ１３と、生成した予測立体構造を、生体高分子の残基周辺に位置するリガンド原子の空間配置分布の統計に基づいて定義された複数の相互作用パターンと照合し、相互作用パターンとの照合結果を表す予測立体構造ベクトルへ変換するステップＳ１４と、機械学習アルゴリズムを用いて予測立体構造ベクトルを判別することによって、生体高分子の立体構造と化合物の立体構造との結合性を予測するステップＳ１５と、を含む。【選択図】図１１Provided is a connectivity prediction method with high prediction accuracy and improved calculation speed. A step S11 for obtaining a target biopolymer designation and a three-dimensional structure of a compound for which a binding property is predicted, and a three-dimensional structure database in which the three-dimensional structure of the biopolymer is accumulated, Step S12 for acquiring the three-dimensional structure of the molecule, Step S13 for generating a predicted three-dimensional structure of the complex of the biopolymer and the compound based on the acquired three-dimensional structure, and the generated three-dimensional structure of the biopolymer A step S14 of collating with a plurality of interaction patterns defined on the basis of statistics of the spatial arrangement distribution of ligand atoms located around the residue, and converting them into predicted three-dimensional structure vectors representing the collation results with the interaction patterns; Predicting the connectivity between the 3D structure of a biopolymer and the 3D structure of a compound by discriminating the predicted 3D structure vector using a learning algorithm It includes a step S15, the. [Selection] Figure 11

Description

Translated fromJapanese

本発明は、標的とする生体高分子の立体構造と化合物の立体構造との結合性を予測する方法、装置、プログラム、記録媒体、および結合性の予測に使用する機械学習アルゴリズムの製造方法に関する。 The present invention relates to a method, an apparatus, a program, a recording medium, and a method for manufacturing a machine learning algorithm used for predicting the binding between a target three-dimensional structure of a biopolymer and a three-dimensional structure of a compound.

現在、新規の医薬を開発するためには、非常に長い期間と莫大な費用がかかっている。そのため、創薬の分野においては、医薬品の開発を効率化することを目的として、薬物の候補となる化合物を探索または最適化する種々の方法が模索されている。そのような方法の1つとして、コンピュータ・シミュレーションにより、標的となる生体高分子(例えば、タンパク質)と結合するリガンドのスクリーニングを行う方法(インシリコスクリーニング)が研究されている。コンピュータ・シミュレーションにおいて行われる、ドッキングシミュレーションは、タンパク質および化合物の立体構造の情報に基づいて複合体の安定構造をコンピュータ上で予測するものである。 Currently, developing a new medicine takes a very long period of time and enormous costs. Therefore, in the field of drug discovery, various methods for searching or optimizing compounds that are drug candidates are being sought for the purpose of improving the efficiency of drug development. As one of such methods, a method (in silico screening) for screening a ligand that binds to a target biopolymer (eg, protein) by computer simulation has been studied. The docking simulation performed in the computer simulation is to predict the stable structure of the complex on the computer based on the information of the three-dimensional structure of the protein and the compound.

薬物の候補となる標的に結合する化合物をコンピュータ・シミュレーションにより探索する方法としては、例えば、分子動力学法などを使用したものが存在する。そのような標的生体分子に結合する化合物の候補を予測する手法としては、下記特許文献１に記載の方法などが挙げられる。 As a method for searching for a compound that binds to a drug candidate target by computer simulation, for example, there is a method using a molecular dynamics method or the like. Examples of a method for predicting a candidate for a compound that binds to such a target biomolecule include the method described in Patent Document 1 below.

特許文献１には、タンパク質を含む標的生体分子と低分子化合物との結合性を予測するプログラムおよび支援方法が開示されている。特許文献１に記載のプログラムおよび支援方法は、量子化学計算などを用いた第１〜第３シミュレーションを組み合わせる手法であり、標的生体分子の活性を制御する化合物の候補を、より正確に予測することが可能となる。 Patent Document 1 discloses a program and a support method for predicting the binding between a target biomolecule including a protein and a low molecular weight compound. The program and the support method described in Patent Document 1 are a method that combines the first to third simulations using quantum chemical calculation and the like, and more accurately predict a candidate for a compound that controls the activity of a target biomolecule. Is possible.

特開２０１６−１６６１５９号公報Japanese Patent Laid-Open No. 2006-166159

特許文献１に記載のプログラムおよび支援方法では、タンパク質および低分子化合物の配置の座標データから、量子化学計算により相互作用エネルギー（結合自由エネルギーのエンタルピー項）を算出している。しかしながら、量子化学計算により相互作用エネルギーの計算を精密に行うためには、演算能力が高いワークステーションやスーパコンピュータが必要とされ、シミュレーションに比較的長い時間を要するという問題がある。また、結合性を予測する対象の化合物の数が増加するほど、シミュレーションに要する時間も長期化するという問題がある。 In the program and the support method described in Patent Document 1, interaction energy (enthalpy term of binding free energy) is calculated from the coordinate data of the arrangement of proteins and low-molecular compounds by quantum chemical calculation. However, in order to accurately calculate the interaction energy by quantum chemical calculation, a workstation and a super computer with high computing power are required, and there is a problem that a relatively long time is required for the simulation. Moreover, there is a problem that the time required for the simulation becomes longer as the number of compounds for which the binding property is predicted increases.

本発明の目的は、標的とする生体高分子の立体構造と化合物の立体構造との結合性について、得られる予測精度が高く演算速度が向上した、結合性予測方法、装置、プログラム、記録媒体、および結合性の予測に使用する機械学習アルゴリズムの製造方法を提供することにある。 An object of the present invention is to provide a connectivity prediction method, apparatus, program, recording medium, which has high prediction accuracy and improved calculation speed, with respect to the connectivity between the target three-dimensional structure of the biopolymer and the three-dimensional structure of the compound. Another object of the present invention is to provide a method of manufacturing a machine learning algorithm used for predicting connectivity.

上記目的を達成するための本発明は、以下に示す態様を含む。
（項１）
標的の生体高分子の指定と、結合性予測対象の化合物の立体構造とを取得するステップと、
生体高分子の立体構造を蓄積した立体構造データベースから、前記指定に対応する生体高分子の立体構造を取得するステップと、
取得した前記生体高分子の立体構造と前記化合物の立体構造とに基づいて、前記生体高分子と前記化合物との複合体の予測立体構造を生成するステップと、
生成した前記予測立体構造を、生体高分子の残基周辺に位置するリガンド原子の空間配置分布の統計に基づいて定義された複数の相互作用パターンを含む相互作用パターンデータベースと照合し、前記相互作用パターンとの照合結果を表す予測立体構造ベクトルへ変換するステップと、
変換した前記予測立体構造ベクトルを機械学習アルゴリズムに入力し、前記機械学習アルゴリズムを用いて前記予測立体構造ベクトルを判別することによって、前記生体高分子の立体構造と前記化合物の立体構造との結合性を予測するステップと、
を含む方法。
（項２）
前記機械学習アルゴリズムの学習に用いられる訓練データが、残基周辺に位置するリガンド原子の空間配置分布の統計に基づいて定義された複数の相互作用パターンを含む相互作用パターンデータベースに基づいて生成されている、項１に記載の方法。
（項３）
前記相互作用パターンデータベースが、
前記立体構造データベースから、生体高分子とリガンドとの複合体の立体構造を取得するステップと、
前記立体構造データベースから取得した前記複合体の前記立体構造を、残基周辺に位置するリガンド原子の空間配置情報へ変換するステップと、
前記立体構造を取得するステップと、前記空間配置情報へ変換するステップとを繰り返し行うことにより、残基周辺に位置するリガンド原子の空間配置分布の統計を取得するステップと、
前記空間配置分布の統計に基づいて、複数の相互作用パターンを定義するステップと、
を含む方法により生成されている、項１または２に記載の方法。
（項Ａ）
前記機械学習アルゴリズムが、ニューラルネットワーク構造の深層学習アルゴリズムである、項１から３のいずれかに記載の方法。
（項Ｂ）
結合性予測対象の前記化合物の立体構造が、理論的に求められた立体構造を含む、項Ａに記載の方法。
（項Ｃ）
前記生体高分子が、タンパク質、核酸（ＤＮＡ、ＲＮＡ）または多糖である、項１から３およびＡからＢのいずれかに記載の方法。
（項Ｄ）
前記残基が、アミノ酸残基、ヌクレオチド残基、および単糖残基からなる群から選択されるいずれかの物質である、項１から３およびＡからＣのいずれかに記載の方法。
（項Ｅ）
結合性予測対象の前記化合物の立体構造を、前記立体構造データベースから取得する、項１から３およびＡからＤのいずれかに記載の方法。
（項Ｆ）
前記立体構造データベースが蛋白質構造データバンクである、項１から３およびＡからＥのいずれかに記載の方法。
（項４）
標的の生体高分子の指定と、結合性予測対象の化合物の立体構造とを取得する予測対象取得手段と、
生体高分子の立体構造を蓄積した立体構造データベースから、前記指定に対応する生体高分子の立体構造を取得する立体構造取得手段と、
取得した前記生体高分子の立体構造と前記化合物の立体構造とに基づいて、前記生体高分子と前記化合物との複合体の予測立体構造を生成する予測構造生成手段と、
生成した前記予測立体構造を、生体高分子の残基周辺に位置するリガンド原子の空間配置分布の統計に基づいて定義された複数の相互作用パターンを含む相互作用パターンデータベースと照合し、前記相互作用パターンとの照合結果を表す予測立体構造ベクトルへ変換する予測ベクトル変換手段と、
変換した前記予測立体構造ベクトルを機械学習アルゴリズムに入力し、前記機械学習アルゴリズムを用いて前記予測立体構造ベクトルを判別することによって、前記生体高分子の立体構造と前記化合物の立体構造との結合性を予測する結合性予測手段と、
を備える装置。
（項５）
コンピュータに、
標的の生体高分子の指定と、結合性予測対象の化合物の立体構造とを取得する予測対象取得機能と、
生体高分子の立体構造を蓄積した立体構造データベースから、前記指定に対応する生体高分子の立体構造を取得する立体構造取得機能と、
取得した前記生体高分子の立体構造と前記化合物の立体構造とに基づいて、前記生体高分子と前記化合物との複合体の予測立体構造を生成する予測構造生成機能と、
生成した前記予測立体構造を、生体高分子の残基周辺に位置するリガンド原子の空間配置分布の統計に基づいて定義された複数の相互作用パターンを含む相互作用パターンデータベースと照合し、前記相互作用パターンとの照合結果を表す予測立体構造ベクトルへ変換する予測ベクトル変換機能と、
変換した前記予測立体構造ベクトルを機械学習アルゴリズムに入力し、前記機械学習アルゴリズムを用いて前記予測立体構造ベクトルを判別することによって、前記生体高分子の立体構造と前記化合物の立体構造との結合性を予測する結合性予測機能と、
を実現させるプログラム。
（項６）
項５に記載のプログラムを記録した、コンピュータ読み取り可能な非一時的な有体の記録媒体。
（項７）
生体高分子の立体構造を蓄積した立体構造データベースから、生体高分子とリガンドとの複合体の立体構造を取得するステップと、
前記立体構造データベースから取得した前記複合体の前記立体構造を、生体高分子の残基周辺に位置するリガンド原子の空間配置情報へ変換するステップと、
前記空間配置情報を、残基周辺に位置するリガンド原子の空間配置分布の統計に基づいて定義された複数の相互作用パターンを含む相互作用パターンデータベースと照合し、前記相互作用パターンとの照合結果を表す空間配置ベクトルへ変換するステップと、
前記立体構造データベースから取得した前記複合体の前記立体構造を、前記生体高分子の立体構造と前記リガンドの立体構造とに分割するステップと、
分割した前記生体高分子の立体構造と前記リガンドの立体構造とに基づいて、前記生体高分子と前記リガンドとの複合体の予測立体構造を生成するステップと、
生成した前記予測立体構造を前記相互作用パターンデータベースと照合し、前記相互作用パターンとの照合結果を表す予測立体構造ベクトルへ変換するステップと、
前記予測立体構造ベクトルと前記空間配置ベクトルとを訓練データとして、機械学習アルゴリズムを学習させるステップと、
を含む、機械学習アルゴリズムの製造方法。
（項８）
前記機械学習アルゴリズムを学習させるステップが、前記空間配置ベクトルを正例として、前記予測立体構造ベクトルについて正例または負例を示すラベルを決定し、前記予測立体構造ベクトルを入力層とし前記ラベルを出力層として、機械学習アルゴリズムを学習させるステップである、項７に記載の機械学習アルゴリズムの製造方法。
（項９）
前記相互作用パターンデータベースが、
前記立体構造データベースから、生体高分子とリガンドとの複合体の立体構造を取得するステップと、
前記立体構造データベースから取得した前記複合体の前記立体構造を、残基周辺に位置するリガンド原子の空間配置情報へ変換するステップと、
前記立体構造を取得するステップと、前記空間配置情報へ変換するステップとを繰り返し行うことにより、残基周辺に位置するリガンド原子の空間配置分布の統計を取得するステップと、
前記空間配置分布の統計に基づいて、複数の相互作用パターンを定義するステップと、
を含む方法により生成されている、項７または８に記載の機械学習アルゴリズムの製造方法。
（項１０）
生体高分子の立体構造を蓄積した立体構造データベースから、生体高分子とリガンドとの複合体の立体構造を取得する複合体取得手段と、
前記立体構造データベースから取得した前記複合体の前記立体構造を、生体高分子の残基周辺に位置するリガンド原子の空間配置情報へ変換する空間情報変換手段と、
前記空間配置情報を、残基周辺に位置するリガンド原子の空間配置分布の統計に基づいて定義された複数の相互作用パターンを含む相互作用パターンデータベースと照合し、前記相互作用パターンとの照合結果を表す空間配置ベクトルへ変換する空間ベクトル変換手段と、
前記立体構造データベースから取得した前記複合体の前記立体構造を、前記生体高分子の立体構造と前記リガンドの立体構造とに分割する複合体分割手段と、
分割した前記生体高分子の立体構造と前記リガンドの立体構造とに基づいて、前記生体高分子と前記リガンドとの複合体の予測立体構造を生成する予測構造生成手段と、
生成した前記予測立体構造を前記相互作用パターンデータベースと照合し、前記相互作用パターンとの照合結果を表す予測立体構造ベクトルへ変換する予測ベクトル変換手段と、
前記予測立体構造ベクトルと前記空間配置ベクトルとを訓練データとして、機械学習アルゴリズムを学習させる学習手段と、
を備える、機械学習アルゴリズムの製造装置。
（項１１）
コンピュータに、
生体高分子の立体構造を蓄積した立体構造データベースから、生体高分子とリガンドとの複合体の立体構造を取得する複合体取得機能と、
前記立体構造データベースから取得した前記複合体の前記立体構造を、生体高分子の残基周辺に位置するリガンド原子の空間配置情報へ変換する空間情報変換機能と、
前記空間配置情報を、残基周辺に位置するリガンド原子の空間配置分布の統計に基づいて定義された複数の相互作用パターンを含む相互作用パターンデータベースと照合し、前記相互作用パターンとの照合結果を表す空間配置ベクトルへ変換する空間ベクトル変換機能と、
前記立体構造データベースから取得した前記複合体の前記立体構造を、前記生体高分子の立体構造と前記リガンドの立体構造とに分割する複合体分割機能と、
分割した前記生体高分子の立体構造と前記リガンドの立体構造とに基づいて、前記生体高分子と前記リガンドとの複合体の予測立体構造を生成する予測構造生成機能と、
生成した前記予測立体構造を前記相互作用パターンデータベースと照合し、前記相互作用パターンとの照合結果を表す予測立体構造ベクトルへ変換する予測ベクトル変換機能と、
前記予測立体構造ベクトルと前記空間配置ベクトルとを訓練データとして、機械学習アルゴリズムを学習させる学習機能と、
を実現させるプログラム。
（項１２）
項１１に記載のプログラムを記録した、コンピュータ読み取り可能な非一時的な有体の記録媒体。The present invention for achieving the above object includes the following embodiments.
(Claim 1)
Obtaining a target biopolymer designation and the three-dimensional structure of the compound for which the binding is predicted;
Obtaining a three-dimensional structure of the biopolymer corresponding to the designation from a three-dimensional structure database storing the three-dimensional structure of the biopolymer;
Generating a predicted three-dimensional structure of a complex of the biopolymer and the compound based on the obtained three-dimensional structure of the biopolymer and the three-dimensional structure of the compound;
The generated predicted three-dimensional structure is collated with an interaction pattern database including a plurality of interaction patterns defined on the basis of statistics of spatial arrangement distribution of ligand atoms located around the residue of the biopolymer, and the interaction Converting to a predicted three-dimensional structure vector representing a matching result with a pattern;
The converted predicted three-dimensional structure vector is input to a machine learning algorithm, and the predicted three-dimensional structure vector is discriminated using the machine learning algorithm, thereby connecting the three-dimensional structure of the biopolymer and the three-dimensional structure of the compound. Predicting
Including methods.
(Section 2)
Training data used for learning of the machine learning algorithm is generated based on an interaction pattern database including a plurality of interaction patterns defined based on statistics of spatial arrangement distribution of ligand atoms located around residues. Item 2. The method according to Item 1.
(Section 3)
The interaction pattern database is
Obtaining a three-dimensional structure of a complex of a biopolymer and a ligand from the three-dimensional structure database;
Converting the three-dimensional structure of the complex obtained from the three-dimensional structure database into spatial arrangement information of ligand atoms located around the residue;
Obtaining statistics of spatial arrangement distribution of ligand atoms located in the vicinity of residues by repeatedly obtaining the three-dimensional structure and converting to the spatial arrangement information; and
Defining a plurality of interaction patterns based on statistics of the spatial distribution;
Item 3. The method according to Item 1 or 2, wherein the method is generated by a method comprising:
(Section A)
Item 4. The method according to any one of Items 1 to 3, wherein the machine learning algorithm is a deep learning algorithm having a neural network structure.
(Section B)
Item 3. The method according to Item A, wherein the three-dimensional structure of the compound to be predicted for binding includes a theoretically obtained three-dimensional structure.
(Section C)
Item 6. The method according to any one of Items 1 to 3 and A to B, wherein the biopolymer is a protein, a nucleic acid (DNA, RNA), or a polysaccharide.
(Section D)
Item 10. The method according to any one of Items 1 to 3 and A to C, wherein the residue is any substance selected from the group consisting of an amino acid residue, a nucleotide residue, and a monosaccharide residue.
(Section E)
Item 5. The method according to any one of Items 1 to 3 and A to D, wherein the three-dimensional structure of the compound to be predicted for binding is obtained from the three-dimensional structure database.
(Section F)
Item 10. The method according to any one of Items 1 to 3 and A to E, wherein the three-dimensional structure database is a protein structure data bank.
(Section 4)
Prediction target acquisition means for acquiring the target biopolymer designation and the three-dimensional structure of the binding prediction target compound;
Three-dimensional structure acquisition means for acquiring a three-dimensional structure of a biopolymer corresponding to the designation from a three-dimensional structure database in which the three-dimensional structure of the biopolymer is accumulated;
A predicted structure generating means for generating a predicted three-dimensional structure of a complex of the biopolymer and the compound based on the acquired three-dimensional structure of the biopolymer and the three-dimensional structure of the compound;
The generated predicted three-dimensional structure is collated with an interaction pattern database including a plurality of interaction patterns defined on the basis of statistics of spatial arrangement distribution of ligand atoms located around the residue of the biopolymer, and the interaction Predicted vector conversion means for converting into a predicted three-dimensional structure vector representing a matching result with a pattern;
The converted predicted three-dimensional structure vector is input to a machine learning algorithm, and the predicted three-dimensional structure vector is discriminated using the machine learning algorithm, thereby connecting the three-dimensional structure of the biopolymer and the three-dimensional structure of the compound. A connectivity prediction means for predicting
A device comprising:
(Section 5)
On the computer,
Prediction target acquisition function for acquiring the target biopolymer designation and the three-dimensional structure of the binding prediction target compound;
A three-dimensional structure acquisition function for acquiring a three-dimensional structure of a biopolymer corresponding to the designation from a three-dimensional structure database storing the three-dimensional structure of the biopolymer;
A predicted structure generation function for generating a predicted three-dimensional structure of a complex of the biopolymer and the compound based on the acquired three-dimensional structure of the biopolymer and the three-dimensional structure of the compound;
The generated predicted three-dimensional structure is collated with an interaction pattern database including a plurality of interaction patterns defined on the basis of statistics of spatial arrangement distribution of ligand atoms located around the residue of the biopolymer, and the interaction A predicted vector conversion function for converting into a predicted three-dimensional structure vector representing a matching result with a pattern;
The converted predicted three-dimensional structure vector is input to a machine learning algorithm, and the predicted three-dimensional structure vector is discriminated using the machine learning algorithm, thereby connecting the three-dimensional structure of the biopolymer and the three-dimensional structure of the compound. A connectivity prediction function that predicts
A program that realizes
(Claim 6)
A computer-readable non-transitory tangible recording medium on which the program according to Item 5 is recorded.
(Claim 7)
Obtaining a three-dimensional structure of a complex of a biopolymer and a ligand from a three-dimensional structure database accumulating the three-dimensional structure of the biopolymer;
Converting the three-dimensional structure of the complex obtained from the three-dimensional structure database into spatial arrangement information of ligand atoms located around biopolymer residues;
The spatial arrangement information is collated with an interaction pattern database including a plurality of interaction patterns defined based on statistics of spatial arrangement distribution of ligand atoms located around the residue, and the collation result with the interaction pattern is obtained. Converting to a spatial layout vector representing;
Dividing the three-dimensional structure of the complex obtained from the three-dimensional structure database into a three-dimensional structure of the biopolymer and a three-dimensional structure of the ligand;
Generating a predicted three-dimensional structure of a complex of the biopolymer and the ligand based on the three-dimensional structure of the biopolymer and the three-dimensional structure of the ligand,
Collating the generated predicted three-dimensional structure with the interaction pattern database, and converting the predicted three-dimensional structure vector into a predicted three-dimensional structure vector representing a matching result with the interaction pattern;
Learning the machine learning algorithm using the predicted three-dimensional structure vector and the spatial arrangement vector as training data;
A method for manufacturing a machine learning algorithm.
(Section 8)
The step of learning the machine learning algorithm determines a label indicating a positive example or a negative example for the predicted three-dimensional structure vector using the spatial arrangement vector as a positive example, and outputs the label using the predicted three-dimensional structure vector as an input layer Item 8. The method of manufacturing a machine learning algorithm according to Item 7, which is a step of learning a machine learning algorithm as a layer.
(Claim 9)
The interaction pattern database is
Obtaining a three-dimensional structure of a complex of a biopolymer and a ligand from the three-dimensional structure database;
Converting the three-dimensional structure of the complex obtained from the three-dimensional structure database into spatial arrangement information of ligand atoms located around the residue;
Obtaining statistics of spatial arrangement distribution of ligand atoms located in the vicinity of residues by repeatedly obtaining the three-dimensional structure and converting to the spatial arrangement information; and
Defining a plurality of interaction patterns based on statistics of the spatial distribution;
Item 9. A method for manufacturing a machine learning algorithm according to Item 7 or 8, which is generated by a method including:
(Section 10)
A complex acquisition means for acquiring a three-dimensional structure of a complex of a biopolymer and a ligand from a three-dimensional structure database storing the three-dimensional structure of the biopolymer;
Spatial information conversion means for converting the three-dimensional structure of the complex acquired from the three-dimensional structure database into spatial arrangement information of ligand atoms located in the vicinity of the residue of the biopolymer;
The spatial arrangement information is collated with an interaction pattern database including a plurality of interaction patterns defined based on statistics of spatial arrangement distribution of ligand atoms located around the residue, and the collation result with the interaction pattern is obtained. Space vector conversion means for converting into a space arrangement vector to be represented;
A complex dividing means for dividing the three-dimensional structure of the complex obtained from the three-dimensional structure database into a three-dimensional structure of the biopolymer and a three-dimensional structure of the ligand;
A predicted structure generating means for generating a predicted three-dimensional structure of a complex of the biopolymer and the ligand based on the three-dimensional structure of the biopolymer and the three-dimensional structure of the ligand,
A predicted vector conversion unit that compares the generated predicted three-dimensional structure with the interaction pattern database and converts the predicted three-dimensional structure vector into a predicted three-dimensional structure vector that represents a matching result with the interaction pattern;
Learning means for learning a machine learning algorithm using the predicted three-dimensional structure vector and the spatial arrangement vector as training data;
A machine learning algorithm manufacturing apparatus comprising:
(Item 11)
On the computer,
A complex acquisition function for acquiring a three-dimensional structure of a complex of a biopolymer and a ligand from a three-dimensional structure database storing the three-dimensional structure of the biopolymer;
A spatial information conversion function for converting the three-dimensional structure of the complex obtained from the three-dimensional structure database into spatial arrangement information of ligand atoms located around a residue of a biopolymer;
The spatial arrangement information is collated with an interaction pattern database including a plurality of interaction patterns defined based on statistics of spatial arrangement distribution of ligand atoms located around the residue, and the collation result with the interaction pattern is obtained. A space vector conversion function for converting into a space arrangement vector to be represented;
A complex dividing function for dividing the three-dimensional structure of the complex obtained from the three-dimensional structure database into the three-dimensional structure of the biopolymer and the three-dimensional structure of the ligand;
A predicted structure generation function for generating a predicted three-dimensional structure of a complex of the biopolymer and the ligand based on the three-dimensional structure of the biopolymer and the three-dimensional structure of the ligand,
A predicted vector conversion function for matching the generated predicted three-dimensional structure with the interaction pattern database and converting the predicted three-dimensional structure vector into a predicted three-dimensional structure vector representing a matching result with the interaction pattern;
A learning function for learning a machine learning algorithm using the predicted three-dimensional structure vector and the spatial arrangement vector as training data;
A program that realizes
(Clause 12)
Item 12. A computer-readable non-transitory tangible recording medium on which the program according to Item 11 is recorded.

本発明によると、得られる予測精度が高く演算速度が向上した、結合性予測方法、装置、プログラム、記録媒体、および結合性の予測に使用する機械学習アルゴリズムの製造方法を提供することができる。 According to the present invention, it is possible to provide a connectivity prediction method, apparatus, program, recording medium, and method for manufacturing a machine learning algorithm used for connectivity prediction, with high prediction accuracy and improved computation speed.

本発明の実施の形態に係る結合性予測システムの概略構成図である。It is a schematic block diagram of the connectivity prediction system which concerns on embodiment of this invention.ユーザ側装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of a user side apparatus.相互作用パターンデータベースの作成手順を示すフローチャートである。It is a flowchart which shows the preparation procedure of an interaction pattern database.相互作用パターンデータベース作成処理の詳細を説明するための模式図である。It is a schematic diagram for demonstrating the detail of an interaction pattern database creation process.複合体の立体構造をアミノ酸周辺のリガンド原子の空間配置情報へ変換する手順を説明するための模式図である。It is a schematic diagram for demonstrating the procedure which converts the three-dimensional structure of a composite_body | complex to the spatial arrangement information of the ligand atom around an amino acid.深層学習装置の機能を説明するためのブロック図である。It is a block diagram for demonstrating the function of a deep learning apparatus.深層学習処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a deep learning process.深層学習処理の詳細を説明するための模式図である。It is a schematic diagram for demonstrating the detail of a deep learning process.ニューラルネットワークによる学習処理の詳細を説明するための模式図である。It is a schematic diagram for demonstrating the detail of the learning process by a neural network.結合性予測装置の機能を説明するためのブロック図である。It is a block diagram for demonstrating the function of a connectivity prediction apparatus.結合性予測処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a connectivity prediction process.結合性予測処理の詳細を説明するための模式図である。It is a schematic diagram for demonstrating the detail of a connectivity prediction process.

以下、本発明の実施の形態を、添付の図面を参照して詳細に説明する。なお、以下の説明および図面において、同じ符号は同じまたは類似の構成要素を示すこととし、よって、同じまたは類似の構成要素に関する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description and drawings, the same reference numerals indicate the same or similar components, and thus descriptions of the same or similar components are omitted.

以下に説明する本発明の実施の形態では、標的のタンパク質の立体構造と化合物の立体構造との結合性を、ニューラルネットワーク構造の深層学習アルゴリズムに基づいて予測する場合を一例として説明する。結合性の予測に先立って、相互作用パターンデータベースと学習済みの深層学習アルゴリズムとが予め作成されている。 In the embodiment of the present invention described below, a case where the connectivity between a target protein three-dimensional structure and a compound three-dimensional structure is predicted based on a deep learning algorithm having a neural network structure will be described as an example. Prior to predicting connectivity, an interaction pattern database and a learned deep learning algorithm are created in advance.

タンパク質の立体構造は、公知の蛋白質構造データバンク（PDB: Protein Data Bank、URL https://pdbj.org/、以下、単に蛋白質構造データバンクと記載する）から取得される。蛋白質構造データバンクは、核磁気共鳴法、Ｘ線結晶構造解析法等により実験的に確認されている種々のタンパク質の立体構造を、国際的に統一化されたフォーマットで記述して記録したデータベースである。例えば蛋白質構造データバンクでは、「ｐｄｂフォーマット」と呼ばれる形式で、立体構造が記述されている。ｐｄｂフォーマットでは、１行毎に情報が記述されており、１行に原子ひとつのＸ，ＹおよびＺ座標が記載されている。 The three-dimensional structure of the protein is obtained from a known protein structure data bank (PDB: Protein Data Bank, URL https://pdbj.org/, hereinafter simply referred to as protein structure data bank). The protein structure data bank is a database that records and records the three-dimensional structures of various proteins that have been experimentally confirmed by nuclear magnetic resonance, X-ray crystal structure analysis, etc., in an internationally unified format. is there. For example, in the protein structure data bank, the three-dimensional structure is described in a format called “pdb format”. In the pdb format, information is described for each line, and the X, Y, and Z coordinates of one atom are described for each line.

結合性を予測する対象である化合物は、本実施形態では低分子化合物である。低分子化合物の分子量は特に制限されないが、例えば３００〜８００程度である。 In the present embodiment, the compound for which the binding property is predicted is a low-molecular compound. The molecular weight of the low molecular compound is not particularly limited, but is about 300 to 800, for example.

タンパク質と化合物との複合体の立体構造は、相互作用パターンデータベースと照合することにより、相互作用パターンとの照合結果を表すベクトル情報へ変換される。変換されたベクトル情報は深層学習アルゴリズムに入力され、予測結果として、標的のタンパク質の立体構造と化合物の立体構造との結合性が出力される。 The three-dimensional structure of the complex of the protein and the compound is converted into vector information representing the matching result with the interaction pattern by matching with the interaction pattern database. The converted vector information is input to the deep learning algorithm, and as a prediction result, the connectivity between the three-dimensional structure of the target protein and the three-dimensional structure of the compound is output.

相互作用パターンデータベースは、複数の相互作用パターンをデータベースとして記録しており、所定の手順にて予め作成されている。相互作用パターンは、アミノ酸の周辺に位置するリガンド原子の空間配置分布の統計に基づいて定義されている。 The interaction pattern database records a plurality of interaction patterns as a database, and is created in advance by a predetermined procedure. The interaction pattern is defined based on the statistics of the spatial distribution of ligand atoms located around the amino acid.

ニューラルネットワーク構造の深層学習アルゴリズムは、訓練データ（または教師データとも呼ぶ）を用いた学習ステップにより予め学習されている。 A deep learning algorithm having a neural network structure is learned in advance by a learning step using training data (or also called teacher data).

［構成の概要］
図１は、本発明の実施の形態に係る結合性予測システムの概略構成図である。本発明の実施の形態に係る結合性予測システムは、ユーザ側装置１００（１００Ａ，１００Ｂ）と、データサーバ２００とを備える。ユーザ側装置１００とデータサーバ２００とはネットワーク９９を通じて接続されている。任意の構成として、結合性予測システムは、ネットワーク９９を通じて接続されるアプリケーションサーバ３００をさらに備えることができる。[Configuration overview]
FIG. 1 is a schematic configuration diagram of a connectivity prediction system according to an embodiment of the present invention. The connectivity predicting system according to the embodiment of the present invention includes a user apparatus 100 (100A, 100B) and a data server 200. The user side device 100 and the data server 200 are connected through a network 99. As an optional configuration, the connectivity prediction system may further include an application server 300 connected through the network 99.

ユーザ側装置１００は相互作用パターンデータベース１１０を備え、訓練データを用いた深層学習処理時には、深層学習装置１００Ａとして動作し、学習済みの深層学習アルゴリズムを用いた結合性予測処理時には、結合性予測装置１００Ｂとして動作する。ユーザ側装置１００は、例えば汎用コンピュータで構成されており、後述するフローチャートに基づいて、深層学習処理および結合性予測処理を行う。相互作用パターンデータベース１１０は、深層学習処理および結合性予測処理の両方において用いられる。 The user-side device 100 includes an interaction pattern database 110, which operates as a deep learning device 100A during deep learning processing using training data, and a connectivity prediction device during connectivity prediction processing using a learned deep learning algorithm. Operates as 100B. The user side device 100 is configured by, for example, a general-purpose computer, and performs a deep learning process and a connectivity prediction process based on a flowchart described later. The interaction pattern database 110 is used in both the deep learning process and the connectivity prediction process.

データサーバ２００は立体構造データベース２１０を備える。立体構造データベース２１０には、実験的に確認された種々のタンパク質の立体構造が所定のフォーマットで記述されてデータベースとして記録されている。本実施形態では、立体構造データベース２１０は、上記した公知の蛋白質構造データバンクであり、以下の説明において所定の記述フォーマットとは、上記した「ｐｄｂフォーマット」を意味する。 The data server 200 includes a three-dimensional structure database 210. In the three-dimensional structure database 210, three-dimensional structures of various proteins confirmed experimentally are described in a predetermined format and recorded as a database. In the present embodiment, the three-dimensional structure database 210 is the above-described known protein structure data bank, and in the following description, the predetermined description format means the above-mentioned “pdb format”.

［ハードウェア構成］
図２は、ユーザ側装置のハードウェア構成を示すブロック図である。ユーザ側装置１００（１００Ａ，１００Ｂ）は、処理部１０（１０Ａ，１０Ｂ）と、入力部１６と、出力部１７とを備える。[Hardware configuration]
FIG. 2 is a block diagram illustrating a hardware configuration of the user device. The user-side device 100 (100A, 100B) includes a processing unit 10 (10A, 10B), an input unit 16, and an output unit 17.

処理部１０は、後述するデータ処理を行うＣＰＵ（Central Processing Unit）１１と、データ処理の作業領域に使用するメモリ１２と、後述するプログラムおよび処理データを記録する記録部１３と、各部の間でデータを伝送するバス１４と、外部機器とのデータの入出力を行うインタフェース部１５とを備えている。入力部１６および出力部１７は、処理部１０に接続されている。例示的には、入力部１６はキーボードまたはマウス等の入力装置であり、出力部１７は液晶ディスプレイ等の表示装置である。 The processing unit 10 includes a CPU (Central Processing Unit) 11 that performs data processing (to be described later), a memory 12 that is used as a data processing work area, a recording unit 13 that records programs and processing data (to be described later), and each unit. A bus 14 for transmitting data and an interface unit 15 for inputting / outputting data to / from an external device are provided. The input unit 16 and the output unit 17 are connected to the processing unit 10. Illustratively, the input unit 16 is an input device such as a keyboard or a mouse, and the output unit 17 is a display device such as a liquid crystal display.

処理部１０は、以下の図３、図７および図１１で説明する各ステップの処理を行うために、本発明に係るプログラムおよび深層学習アルゴリズムを、例えば実行形式（例えばプログラミング言語からコンパイラにより変換されて生成される）で記録部１３に予め記録している。処理部１０は、記録部１３に記録したプログラムおよび深層学習アルゴリズムを使用して処理を行う。 The processing unit 10 performs, for example, execution of a program and a deep learning algorithm according to the present invention (for example, converted from a programming language into a compiler by a compiler) in order to perform the processing of each step described in FIG. 3, FIG. 7 and FIG. Is recorded in advance in the recording unit 13. The processing unit 10 performs processing using the program and the deep learning algorithm recorded in the recording unit 13.

なお、プログラムおよび学習後の深層学習アルゴリズムは、例えばＤＶＤ−ＲＯＭやＵＳＢメモリ等の、コンピュータ読み取り可能であって非一時的な有形の記録媒体９８から記録部１３にインストールされてもよいし、別所に配置されたアプリケーションサーバ３００（図１を参照）からネットワーク９９を介して記録部１３にインストールされてもよい。 The program and the deep learning algorithm after learning may be installed in the recording unit 13 from a computer-readable non-transitory tangible recording medium 98 such as a DVD-ROM or a USB memory. May be installed in the recording unit 13 via the network 99 from the application server 300 (see FIG. 1) disposed in the network.

以下の説明においては、特に断らない限り、処理部１０が行う処理は、記録部１３またはメモリ１２に格納されたプログラムおよび深層学習アルゴリズムに基づいて、ＣＰＵ１１が行う処理を意味する。ＣＰＵ１１はメモリ１２を作業領域として必要なデータ（処理途中の中間データ等）を一時記憶し、記録部１３に演算結果等の長期保存するデータを適宜記録する。 In the following description, unless otherwise specified, the processing performed by the processing unit 10 means processing performed by the CPU 11 based on a program stored in the recording unit 13 or the memory 12 and a deep learning algorithm. The CPU 11 temporarily stores necessary data (such as intermediate data during processing) using the memory 12 as a work area, and appropriately records data to be stored for a long period of time, such as calculation results, in the recording unit 13.

［機能ブロックおよび処理手順］
以下ではまず、深層学習処理および結合性予測処理の両方において用いる相互作用パターンデータベースの作成方法について、図３〜図５を参照して説明する。次に、訓練データを用いた深層学習アルゴリズムの学習方法について、図６〜図９を参照して説明し、学習済みの深層学習アルゴリズムを用いた結合性の予測方法について、図１０〜図１２を参照して説明する。[Function blocks and processing procedures]
In the following, first, a method for creating an interaction pattern database used in both the deep learning process and the connectivity prediction process will be described with reference to FIGS. Next, the learning method of the deep learning algorithm using the training data will be described with reference to FIGS. 6 to 9, and the connectivity prediction method using the learned deep learning algorithm will be described with reference to FIGS. The description will be given with reference.

・相互作用パターンデータベースの作成
図３は、相互作用パターンデータベースの作成手順を示すフローチャートであり、図４は、相互作用パターンデータベース作成処理の詳細を説明するための模式図である。図５は、複合体の立体構造をアミノ酸周辺のリガンド原子の空間配置情報へ変換する手順を説明するための模式図である。Creation of Interaction Pattern Database FIG. 3 is a flowchart showing the creation procedure of the interaction pattern database, and FIG. 4 is a schematic diagram for explaining the details of the interaction pattern database creation processing. FIG. 5 is a schematic diagram for explaining a procedure for converting the three-dimensional structure of the complex into the spatial arrangement information of ligand atoms around the amino acid.

本実施形態では、図３に示す手順に沿って相互作用パターンデータベース１１０を予め作成しておき、ユーザ側装置１００の記録部１３に予め記録しておく。以下のステップＳ９１〜ステップＳ９４の処理は、入力部１６からのユーザの動作指示に基づいて、ユーザ側装置１００の処理部１０が行うことができる。 In the present embodiment, the interaction pattern database 110 is created in advance according to the procedure shown in FIG. 3 and is recorded in advance in the recording unit 13 of the user side device 100. The processing of the following steps S91 to S94 can be performed by the processing unit 10 of the user side device 100 based on a user operation instruction from the input unit 16.

ステップＳ９１において、処理部１０は、立体構造データベース２１０から、タンパク質とリガンドとの複合体の立体構造を取得する。複合体の立体構造は所定の記述フォーマット（すなわち、上記した「ｐｄｂフォーマット」）で記録されている。 In step S91, the processing unit 10 acquires the three-dimensional structure of the complex of the protein and the ligand from the three-dimensional structure database 210. The three-dimensional structure of the complex is recorded in a predetermined description format (that is, the “pdb format” described above).

ステップＳ９２において、処理部１０は、立体構造データベース２１０から取得した複合体の立体構造を、アミノ酸周辺のリガンド原子の空間配置情報へ変換する。所定の記述フォーマットから変換されたリガンド原子の空間配置パターンの一例を図４（ａ）に示す。図４（ａ）に示す例では、アミノ酸原子４１の周囲に位置するリガンド原子４２の空間配置パターンの一例として、３種類の空間配置パターンが例示されている。図中、アミノ酸原子４１を灰色の球体で表し、リガンド原子４２を白色の球体で表す。 In step S92, the processing unit 10 converts the three-dimensional structure of the complex acquired from the three-dimensional structure database 210 into the spatial arrangement information of ligand atoms around the amino acid. An example of a spatial arrangement pattern of ligand atoms converted from a predetermined description format is shown in FIG. In the example shown in FIG. 4A, three types of spatial arrangement patterns are illustrated as examples of the spatial arrangement patterns of the ligand atoms 42 located around the amino acid atoms 41. In the figure, amino acid atoms 41 are represented by gray spheres, and ligand atoms 42 are represented by white spheres.

図５の模式図を参照して、複合体の立体構造をアミノ酸周辺のリガンド原子の空間配置情報へ変換する手順の一例を、具体的に説明する。本実施形態では、あるアミノ酸を構成する例えば原子３つについて、その周辺に存在するある種のリガンド原子の空間配置分布を求める。例えば、図５（ａ）の左側に示す立体構造がＰＤＢファイル１に記述されており、図５（ａ）の右側に示す立体構造がＰＤＢファイル２に記述されているケースを考える。まず、ＰＤＢファイル１について、以下のステップＳ９２１〜Ｓ９２３の処理を行う。 With reference to the schematic diagram of FIG. 5, an example of a procedure for converting the three-dimensional structure of the complex into the spatial arrangement information of ligand atoms around the amino acid will be specifically described. In the present embodiment, for example, for three atoms constituting a certain amino acid, the spatial arrangement distribution of a certain kind of ligand atom existing in the vicinity thereof is obtained. For example, consider a case where the three-dimensional structure shown on the left side of FIG. 5A is described in the PDB file 1 and the three-dimensional structure shown on the right side of FIG. First, the following steps S921 to S923 are performed on the PDB file 1.

ステップＳ９２１において、ＰＤＢファイルに記述されている立体構造の中から、相互作用しているタンパク質の３原子と、リガンドの１原子のペアとを抽出する。抽出した結果の構造を図５（ｂ）にそれぞれ示す。 In step S921, 3 atoms of interacting protein and 1 atom pair of ligand are extracted from the three-dimensional structure described in the PDB file. The extracted structure is shown in FIG.

ステップＳ９２２において、抽出した構造を回転して向きを揃える。向きを揃えた結果の構造を図５（ｃ）にそれぞれ示す。この図５（ｃ）は図４（ａ）に相当する。 In step S922, the extracted structure is rotated and aligned. FIG. 5 (c) shows the resulting structure with the orientations aligned. FIG. 5C corresponds to FIG.

ステップＳ９２３において、向きが揃えられた複数の構造について、原子の種類が同じ構造を重ね合わせる。重ね合わせた結果を図５（ｄ）に示す。例えば、図５（ｃ）の左端に示す構造と図５（ｃ）の右端に示す構造とは、原子の種類が同じであるので、これら構造を重ね合わせることにより、図５（ｄ）の左側に示す構造を得る。 In step S923, the structures having the same kind of atoms are superimposed on the plurality of structures having the same orientation. The superimposed result is shown in FIG. For example, since the structure shown at the left end of FIG. 5C and the structure shown at the right end of FIG. 5C are the same type of atoms, the structures shown in FIG. The structure shown in is obtained.

ステップＳ９３において、処理部１０は、ステップＳ９１に示す立体構造の取得とステップＳ９２に示す空間配置情報への変換とを繰り返し行うことにより、複数の複合体の立体構造について、リガンド原子の空間配置情報を複数取得し、アミノ酸周辺のリガンド原子の空間配置分布の統計を取得する。得られた空間配置分布の一例を図４（ｂ）に示す。 In step S93, the processing unit 10 repeatedly obtains the three-dimensional structure shown in step S91 and converts it into the spatial arrangement information shown in step S92, so that the spatial arrangement information of the ligand atoms is obtained for the three-dimensional structure of the plurality of complexes. To obtain the statistics of the spatial distribution of ligand atoms around amino acids. An example of the obtained spatial arrangement distribution is shown in FIG.

図５の模式図を参照して説明すると、ＰＤＢファイル２について、上記したステップＳ９２１〜Ｓ９２３の処理を行う。同様の処理を、立体構造データベース２１０から取得した複合体の立体構造の全てのＰＤＢファイルについて行うことにより、４つの原子種類の組合せの数に相当する数の、空間配置分布が得られる。得られる空間配置分布は、図４（ｂ）に例示されるとおりである。以後、後述するように、得られた空間配置分布のそれぞれについて、混合ガウス分布へのフィッティングを行う。 If it demonstrates with reference to the schematic diagram of FIG. 5, the process of above-described step S921-S923 will be performed about PDB file 2. FIG. By performing the same process for all the PDB files of the three-dimensional structure of the complex acquired from the three-dimensional structure database 210, the number of spatial arrangement distributions corresponding to the number of combinations of the four atomic types can be obtained. The obtained spatial arrangement distribution is as illustrated in FIG. Thereafter, as will be described later, the obtained spatial arrangement distribution is fitted to a mixed Gaussian distribution.

ステップＳ９４において、処理部１０は、空間配置分布の統計に基づいて相互作用パターンを定義し、相互作用パターンデータベース１１０を作成する。 In step S94, the processing unit 10 defines an interaction pattern based on the statistics of the spatial arrangement distribution, and creates an interaction pattern database 110.

図４（ｃ）を参照して、相互作用パターンデータベース１１０の作成方法を説明する。例えば、破線１で囲む空間領域にリガンド原子４２が配置されている空間配置のパターンを、「相互作用パターン１」として定義する。同様に、破線２で囲む空間領域にリガンド原子４２が配置されている空間配置のパターンを、「相互作用パターン２」と定義し、破線３，４，５で囲む空間領域のそれぞれにリガンド原子４２が配置されているそれぞれの空間配置のパターンを、「相互作用パターン３」、「相互作用パターン４」、「相互作用パターン５」とそれぞれ定義し、合計Ｎ種類（Ｎは自然数）の相互作用パターンを定義する。定義した相互作用パターン１から相互作用パターンＮのそれぞれを、対応するリガンド原子の空間配置情報と対応付けて、相互作用パターンデータベース１１０として記録部１３に記録し、相互作用パターンデータベース１１０を作成する。 A method for creating the interaction pattern database 110 will be described with reference to FIG. For example, a spatial arrangement pattern in which ligand atoms 42 are arranged in a spatial region surrounded by a broken line 1 is defined as “interaction pattern 1”. Similarly, the spatial arrangement pattern in which the ligand atoms 42 are arranged in the spatial region surrounded by the broken line 2 is defined as “interaction pattern 2”, and the ligand atoms 42 are respectively defined in the spatial regions surrounded by the broken lines 3, 4, and 5. Are defined as “interaction pattern 3”, “interaction pattern 4”, and “interaction pattern 5”, respectively, and a total of N types of interaction patterns (N is a natural number) Define Each defined interaction pattern 1 to interaction pattern N is recorded in the recording unit 13 as the interaction pattern database 110 in association with the corresponding spatial arrangement information of the ligand atoms, and the interaction pattern database 110 is created.

これにより、或る化合物について、アミノ酸残基の周辺に位置するリガンド原子の空間配置情報が与えられると、この空間配置情報を相互作用パターンデータベース１１０と照合することにより、照合した化合物のリガンド原子の空間配置が、合計Ｎ種類の相互作用パターンのうちどの相互作用パターンに該当するのかを決定することが可能となる。 Thus, when spatial arrangement information of a ligand atom located around an amino acid residue is given for a certain compound, this spatial arrangement information is collated with the interaction pattern database 110 to obtain the ligand atom of the collated compound. It is possible to determine which interaction pattern corresponds to the spatial arrangement among a total of N types of interaction patterns.

ここで、相互作用パターンデータベース１１０の作成に用いた立体構造データベース２１０は、実験的に確認された種々のタンパク質の立体構造を記録したデータベースである。したがって、相互作用パターンデータベース１１０との照合により、或る化合物についてリガンド原子の空間配置が「相互作用パターン１」に「該当する」と判断された場合、その空間配置は、リガンド原子がアミノ酸の周辺において実際に取り得る空間配置であり、「相互作用パターン１」であること、すなわち、図４（ｃ）でいう破線１で囲む空間領域にリガンド原子が実際に配置可能であることを意味する。 Here, the three-dimensional structure database 210 used to create the interaction pattern database 110 is a database in which three-dimensional structures of various proteins confirmed experimentally are recorded. Therefore, when it is determined by collation with the interaction pattern database 110 that the spatial arrangement of the ligand atom is “corresponding” to “interaction pattern 1” for a certain compound, the spatial arrangement of the ligand atom is around the amino acid. In FIG. 4C, it means “interaction pattern 1”, that is, it means that ligand atoms can actually be arranged in the space region surrounded by the broken line 1 in FIG.

図４（ｃ）を参照して、相互作用パターンを定義する方法について説明する。相互作用パターンの定義には変分ベイズ推定の手法を用いる。本実施形態では、あるアミノ酸を構成する例えば原子３つについて、その周辺に存在するある種のリガンド原子の空間分布を混合ガウス分布として表現する。例えばアミノ酸におけるＯ-Ｃ-Ｏの周辺のリガンドＣ原子の空間分布をひとつの混合ガウス分布とし、アミノ酸におけるＯ-Ｃ-Ｎの周辺のリガンドＣ原子の空間分布を別の混合ガウス分布とする。すなわち原子種の組み合わせの数だけ、混合ガウス分布が存在する（この数をＭとする）。なお原子種の定義は単純に元素のみでなく、ｓｐ２炭素やｓｐ３炭素などの化学的性質を区別する。本実施形態ではＳＹＢＹＬ原子タイプを用いる。混合ガウス分布はガウス分布の線形和であり、すなわちガウス分布をＧ（μ，σ）とすると、混合ガウス分布はΣπ_ｋＧ（μ_ｋ，σ_ｋ）で表される。ここで、ｋ＝１，２，・・・Ｋであり、Ｋはひとつの混合ガウス分布を構成するガウス分布の個数を意味する。π_ｋ、μ_ｋ、σ_ｋ、の値のＫ個の組合せを求め、それぞれのガウス分布を相互作用パターンとして定義する。Ｋの値は空間分布によって異なり（つまりＯ-Ｃ-ＯとＣの空間分布と、Ｏ-Ｃ-ＮとＣの空間分布で異なる）この値は変分ベイズ法によって自動的に推定される。これをＭ個の空間分布について繰り返すことで様々な原子の種類での相互作用パターンを定義する。相互作用パターンを定義する方法のより詳細な手順については、本発明者らによる文献「Kasahara K, Kinoshita K, “Landscape of Protein-Small ligand Binding Modes”, 2016, Protein Science 25(9):1659-71」を参照されたい。With reference to FIG.4 (c), the method of defining an interaction pattern is demonstrated. Variational Bayesian estimation is used to define the interaction pattern. In the present embodiment, for example, three atoms constituting a certain amino acid, the spatial distribution of a certain kind of ligand atom existing in the vicinity thereof is expressed as a mixed Gaussian distribution. For example, the spatial distribution of ligand C atoms around O—C—O in amino acids is one mixed Gaussian distribution, and the spatial distribution of ligand C atoms around O—C—N in amino acids is another mixed Gaussian distribution. That is, there are mixed Gaussian distributions as many as the number of combinations of atomic species (this number is M). In addition, the definition of the atomic species is not only a simple element but also a chemical property such as sp2 carbon or sp3 carbon. In this embodiment, the SYBYL atom type is used. The mixed Gaussian distribution is a linear sum of Gaussian distributions, that is, when the Gaussian distribution is G (μ, σ), the mixed Gaussian distribution is represented by Σπ_k G (μ_k , σ_k ). Here, k = 1, 2,... K, and K means the number of Gaussian distributions constituting one mixed Gaussian distribution. K combinations of values of π_k , μ_k , and σ_k are obtained, and each Gaussian distribution is defined as an interaction pattern. The value of K differs depending on the spatial distribution (that is, the spatial distribution of O—C—O and C and the spatial distribution of O—C—N and C), and this value is automatically estimated by the variational Bayes method. This is repeated for M spatial distributions to define interaction patterns for various atom types. For a more detailed procedure on how to define the interaction pattern, see the literature “Kasahara K, Kinoshita K,“ Landscape of Protein-Small Ligand Binding Modes ”, 2016, Protein Science 25 (9): 1659- See 71 ”.

・深層学習処理
図６は、深層学習装置１００Ａの機能を説明するためのブロック図である。深層学習装置１００Ａの処理部１０Ａは、複合体取得部１０１と、空間情報変換部１０２と、空間ベクトル変換部１０３と、複合体分割部１０４と、予測構造生成部１０５と、予測ベクトル変換部１０６と、深層学習部１０７と、を備える。これらの機能ブロックは、コンピュータに深層学習処理を実行させるプログラムを、処理部１０Ａの記録部１３にインストールし、このプログラムをＣＰＵ１１が実行することにより実現される。Deep Learning Processing FIG. 6 is a block diagram for explaining functions of the deep learning device 100A. The processing unit 10A of the deep learning device 100A includes a complex acquisition unit 101, a spatial information conversion unit 102, a space vector conversion unit 103, a complex division unit 104, a prediction structure generation unit 105, and a prediction vector conversion unit 106. And a deep learning unit 107. These functional blocks are realized by installing a program for causing the computer to execute the deep learning process in the recording unit 13 of the processing unit 10A and executing the program by the CPU 11.

相互作用パターンデータベース１１０は、処理部１０Ａの記録部１３に予め記録されている。深層学習部１０７が学習させる、学習前または学習途中の深層学習アルゴリズムは、処理部１０Ａの記録部１３に予め記録されている。深層学習装置１００Ａの出力である、深層学習処理による学習済みの深層学習アルゴリズム１０８は、処理部１０Ａの記録部１３に記録される。 The interaction pattern database 110 is recorded in advance in the recording unit 13 of the processing unit 10A. The deep learning algorithm that the deep learning unit 107 learns before learning or during learning is recorded in advance in the recording unit 13 of the processing unit 10A. The deep learning algorithm 108 that has been learned by the deep learning process and is the output of the deep learning device 100A is recorded in the recording unit 13 of the processing unit 10A.

図７は、深層学習処理の手順を示すフローチャートであり、図８は、深層学習処理の詳細を説明するための模式図である。 FIG. 7 is a flowchart showing the procedure of the deep learning process, and FIG. 8 is a schematic diagram for explaining the details of the deep learning process.

深層学習装置１００Ａの処理部１０Ａは、図７に示す処理を行う。図６に示す各機能ブロックを用いて説明すると、ステップＳ１の処理は複合体取得部１０１が行う。ステップＳ２の処理は空間情報変換部１０２が行い、ステップＳ３の処理は空間ベクトル変換部１０３が行う。ステップＳ４の処理は複合体分割部１０４が行い、ステップＳ５の処理は予測構造生成部１０５が行い、ステップＳ６の処理は予測ベクトル変換部１０６が行う。ステップＳ７の処理は深層学習部１０７が行う。 The processing unit 10A of the deep learning device 100A performs the process shown in FIG. If it demonstrates using each functional block shown in FIG. 6, the complex acquisition part 101 will perform the process of step S1. The processing in step S2 is performed by the space information conversion unit 102, and the processing in step S3 is performed by the space vector conversion unit 103. The process of step S4 is performed by the complex dividing unit 104, the process of step S5 is performed by the predicted structure generating unit 105, and the process of step S6 is performed by the predicted vector converting unit 106. The deep learning unit 107 performs the process in step S7.

なお、ステップＳ１〜ステップＳ６の処理は、深層学習に用いる訓練データを準備するステップであり、このうちステップＳ２〜ステップＳ３の処理が、正例として用いる訓練データを準備するステップである。ステップＳ４〜ステップＳ６の処理によって準備される訓練データは、正例として用いる訓練データと負例として用いる訓練データとが混在している訓練データである。 In addition, the process of step S1-step S6 is a step which prepares the training data used for deep learning, and the process of step S2-step S3 is a step which prepares the training data used as a positive example among these. The training data prepared by the processes in steps S4 to S6 is training data in which training data used as a positive example and training data used as a negative example are mixed.

ステップＳ１において、処理部１０Ａは、立体構造データベース２１０から、タンパク質とリガンドとの複合体の立体構造を取得する。立体構造データベース２１０には、実験的に確認された種々のタンパク質の立体構造が所定の記述フォーマットで記録されており、処理部１０Ａは、深層学習アルゴリズムの訓練に用いる複合体の立体構造を複数取得する。例示的には、取得する複合体の立体構造は、約３，０００種類である。 In step S 1, the processing unit 10 A acquires the three-dimensional structure of the complex of protein and ligand from the three-dimensional structure database 210. In the three-dimensional structure database 210, three-dimensional structures of various proteins confirmed experimentally are recorded in a predetermined description format, and the processing unit 10A acquires a plurality of three-dimensional structures of the complex used for training of the deep learning algorithm. To do. Illustratively, the three-dimensional structure of the complex to be obtained is about 3,000 types.

ステップＳ２において、処理部１０Ａは、立体構造データベース２１０から取得した複合体の立体構造を、アミノ酸周辺のリガンド原子の空間配置情報へ変換する。所定の記述フォーマットから変換されたリガンド原子の空間配置パターンの一例を図８（ａ）に示す。図８（ａ）に示す例では、アミノ酸原子４１の周囲に位置するリガンド原子４２の空間配置パターンの一例として、３種類の空間配置パターンが例示されている。本ステップＳ２の処理は、相互作用パターンデータベース１１０を作成する際のステップＳ９２と同じである。 In step S2, the processing unit 10A converts the three-dimensional structure of the complex acquired from the three-dimensional structure database 210 into the spatial arrangement information of ligand atoms around the amino acid. An example of a spatial arrangement pattern of ligand atoms converted from a predetermined description format is shown in FIG. In the example shown in FIG. 8A, three types of spatial arrangement patterns are illustrated as an example of the spatial arrangement pattern of the ligand atoms 42 positioned around the amino acid atoms 41. The processing in this step S2 is the same as that in step S92 when creating the interaction pattern database 110.

処理部１０Ａは、ステップＳ１において取得した複数の複合体の立体構造のそれぞれについて、ステップＳ２に示す空間配置情報への変換を繰り返し行う。 The processing unit 10A repeatedly performs conversion into the spatial arrangement information shown in step S2 for each of the three-dimensional structures of the plurality of complexes acquired in step S1.

ステップＳ３において、処理部１０Ａは、リガンド原子の空間配置情報を相互作用パターンデータベース１１０と照合し、リガンド原子の空間配置情報を空間配置ベクトルへ変換する。例えば図８（ｂ）の左図に示す例では、リガンド原子４２の空間配置は「相互作用パターン２」に合致し、右図に示す例では、リガンド原子４２の空間配置は「相互作用パターン４」に合致する。 In step S3, the processing unit 10A compares the spatial arrangement information of the ligand atoms with the interaction pattern database 110, and converts the spatial arrangement information of the ligand atoms into a spatial arrangement vector. For example, in the example shown in the left diagram of FIG. 8B, the spatial arrangement of the ligand atoms 42 matches the “interaction pattern 2”, and in the example shown in the right diagram, the spatial arrangement of the ligand atoms 42 is “interaction pattern 4”. ".

空間配置情報に示されている各リガンド原子のそれぞれについて、このような相互作用パターンデータベース１１０との照合処理を行うことにより、相互作用パターンとの照合結果を表す空間配置ベクトル５１を取得する。ここで、ステップＳ３において取得した空間配置ベクトル５１は、実験的に確認された種々のタンパク質の立体構造を記録した立体構造データベース２１０から生成されていることから、空間配置ベクトル５１は、アミノ酸の周辺に位置するリガンド原子が実際に取り得る正しい空間配置（正例）を示している。 Each of the ligand atoms indicated in the spatial arrangement information is collated with the interaction pattern database 110, thereby obtaining a spatial arrangement vector 51 representing the collation result with the interaction pattern. Here, since the spatial arrangement vector 51 acquired in step S3 is generated from the three-dimensional structure database 210 in which the three-dimensional structures of various proteins confirmed experimentally are recorded, The correct spatial arrangement (positive example) that the ligand atom located at can actually take is shown.

処理部１０Ａは、ステップＳ２において得られた複数の空間配置情報のそれぞれについて、リガンド原子の空間配置情報を空間配置ベクトル５１へ変換することにより、図８（ｃ）に例示する複数の空間配置ベクトル５１を取得する。これら複数の空間配置ベクトル５１はすべて、アミノ酸原子４１の周囲においてリガンド原子４２が取り得る正しい空間配置を示しているので、ラベル値５２に正例を意味する値を付して記録される。本実施形態では、正例を意味する値として、値「１」またはＢｏｏｌｅａｎ値「Ｔｒｕｅ」を使用し、負例を意味する値として、値「０」またはＢｏｏｌｅａｎ値「Ｆａｌｓｅ」を使用する。 The processing unit 10A converts the spatial arrangement information of the ligand atoms into the spatial arrangement vector 51 for each of the plurality of spatial arrangement information obtained in step S2, thereby a plurality of spatial arrangement vectors illustrated in FIG. 8C. 51 is obtained. All of the plurality of spatial arrangement vectors 51 indicate correct spatial arrangements that can be taken by the ligand atoms 42 around the amino acid atoms 41. Therefore, the label value 52 is recorded with a value meaning a positive example. In the present embodiment, a value “1” or a Boolean value “True” is used as a value meaning a positive example, and a value “0” or a Boolean value “False” is used as a value meaning a negative example.

図８（ｃ）を参照して、空間配置ベクトル５１を具体的に説明する。例えば空間配置ベクトル５１が７次元の整数ベクトル「０１０３０００」で表されている場合を例にとる。この例では、２次元目の値が「１」であり、４次元目の値が「３」であり、１次元目、３次元目、５次元目、６次元目、および７次元目の値が「０」である。したがって、例示する空間配置ベクトル５１は、「相互作用パターン２」で表される位置に存在するリガンド原子が存在するペアが１つあり、「相互作用パターン４」で表される位置にリガンド原子が存在するペアが３つあり、「相互作用パターン１、３、５、６、７」で表される位置には、リガンド原子４２が位置していないことを意味している。 With reference to FIG.8 (c), the space arrangement vector 51 is demonstrated concretely. For example, a case where the spatial arrangement vector 51 is represented by a 7-dimensional integer vector “0103000” is taken as an example. In this example, the value of the second dimension is “1”, the value of the fourth dimension is “3”, the values of the first dimension, the third dimension, the fifth dimension, the sixth dimension, and the seventh dimension. Is “0”. Therefore, in the illustrated spatial arrangement vector 51, there is one pair in which a ligand atom exists at the position represented by “interaction pattern 2”, and the ligand atom is present at the position represented by “interaction pattern 4”. There are three existing pairs, which means that the ligand atom 42 is not located at the position represented by “interaction pattern 1, 3, 5, 6, 7”.

ステップＳ４において、処理部１０Ａは、ステップＳ１において立体構造データベース２１０から取得した複合体の立体構造を、タンパク質の立体構造とリガンドの立体構造とに分割する。例えば立体構造がｐｄｂフォーマットで記述されている本実施形態では、行中の識別子「ＴＥＲ」により、タンパク質の立体構造を記載した行とリガンドの立体構造を記載した行との区切りが判別される。よって、ｐｄｂファイル中のこのように判別された区切り位置においてｐｄｂファイルを分割することにより、タンパク質の立体構造とリガンドの立体構造とが分割される。 In step S4, the processing unit 10A divides the three-dimensional structure of the complex acquired from the three-dimensional structure database 210 in step S1 into a protein three-dimensional structure and a ligand three-dimensional structure. For example, in the present embodiment in which the three-dimensional structure is described in the pdb format, the partition between the line describing the three-dimensional structure of the protein and the line describing the three-dimensional structure of the ligand is discriminated by the identifier “TER” in the line. Therefore, by dividing the pdb file at the separation positions determined in this way in the pdb file, the protein three-dimensional structure and the ligand three-dimensional structure are divided.

ステップＳ５において、処理部１０Ａは、分割した立体構造に基づいて、タンパク質とリガンドとの複合体の予測立体構造を生成する。本実施形態では、例えば分子モデリングシミュレーションソフトウェアであるAutoDockを用いて、複合体の立体構造を複数予測する。複合体の予測された立体構造は、所定の記述フォーマットで記録されている。このような複合体の立体構造を予測する処理を、ステップＳ１において取得した複数の複合体の立体構造に対して行う。例示的には、AutoDockにより予測されるドッキング構造の候補は、約１３，０００種類である。このステップＳ５の時点において取得される、複数の複合体の予測立体構造は、正しい予測構造（正例）と誤った予測構造（負例）とを含んでいる。 In step S 5, the processing unit 10 A generates a predicted three-dimensional structure of the protein / ligand complex based on the divided three-dimensional structure. In the present embodiment, for example, a plurality of three-dimensional structures of a complex are predicted using AutoDock, which is molecular modeling simulation software. The predicted three-dimensional structure of the complex is recorded in a predetermined description format. The process of predicting the three-dimensional structure of such a complex is performed on the three-dimensional structures of the plurality of complexes acquired in step S1. Illustratively, there are approximately 13,000 docking structure candidates predicted by AutoDock. The predicted three-dimensional structures of a plurality of complexes acquired at the time of step S5 include a correct predicted structure (positive example) and an incorrect predicted structure (negative example).

ステップＳ６において、処理部１０Ａは、ステップＳ５において生成した複合体の予測立体構造を相互作用パターンデータベース１１０と照合し、複合体の予測立体構造を予測立体構造ベクトルへ変換する。 In step S6, the processing unit 10A compares the predicted three-dimensional structure of the complex generated in step S5 with the interaction pattern database 110, and converts the predicted three-dimensional structure of the complex into a predicted three-dimensional structure vector.

ステップＳ３と同様に、予測立体構造に示されている各リガンド原子のそれぞれについて、図８（ｂ）に例示する相互作用パターンデータベース１１０との照合処理を行うことにより、相互作用パターンとの照合結果を表す予測立体構造ベクトル５３を取得する。 As in step S3, each of the ligand atoms shown in the predicted three-dimensional structure is collated with the interaction pattern database 110 illustrated in FIG. 8B, thereby collating the interaction pattern. Is obtained.

処理部１０Ａは、ステップＳ５において得られた複合体の複数の立体構造のそれぞれについて、予測立体構造を予測立体構造ベクトル５３へ変換することにより、図８（ｄ）に例示する複数の予測立体構造ベクトル５３を取得する。この時点では、予測立体構造ベクトル５３には、正例または負例を示すラベル値５４は付されていない。予測立体構造ベクトル５３のラベル値５４は、後述するステップＳ７において、ステップＳ３において取得した空間配置ベクトル５１と対比することにより、正例または負例が示される。 The processing unit 10A converts the predicted three-dimensional structure into the predicted three-dimensional structure vector 53 for each of the plurality of three-dimensional structures of the complex obtained in step S5, whereby a plurality of predicted three-dimensional structures illustrated in FIG. Vector 53 is obtained. At this time, the predicted three-dimensional structure vector 53 is not attached with a label value 54 indicating a positive example or a negative example. A positive value or a negative example is shown by comparing the label value 54 of the predicted three-dimensional structure vector 53 with the spatial arrangement vector 51 acquired in step S3 in step S7 described later.

ステップＳ７において、処理部１０Ａは、ステップＳ３において取得した空間配置ベクトル５１と、ステップＳ６において取得した予測立体構造ベクトル５３とを訓練データとして、深層学習アルゴリズムを学習させる。 In step S7, the processing unit 10A causes the deep learning algorithm to be learned using the spatial arrangement vector 51 acquired in step S3 and the predicted three-dimensional structure vector 53 acquired in step S6 as training data.

図９は、ニューラルネットワークによる学習処理の詳細を説明するための模式図である。深層学習タイプのニューラルネットワークとは、図９に示すニューラルネットワーク６０のように、入力層６１ａと、出力層６１ｂと、入力層６１ａおよび出力層６１ｂの間の中間層６１ｃとを備え、中間層６１ｃが複数の層で構成されているニューラルネットワークである。深層学習タイプの場合、中間層６１ｃを構成する層の数は、例えば５層以上とすることができる。 FIG. 9 is a schematic diagram for explaining the details of the learning process by the neural network. As in the neural network 60 shown in FIG. 9, the deep learning type neural network includes an input layer 61a, an output layer 61b, and an intermediate layer 61c between the input layer 61a and the output layer 61b. Is a neural network composed of a plurality of layers. In the case of the deep learning type, the number of layers constituting the intermediate layer 61c can be, for example, 5 or more.

ニューラルネットワーク６０では、層状に配置された複数のノード６２が、隣接する層間においてのみ結合されている。これにより、情報が入力側の層６１ａから出力側の層６１ｂに、図中矢印Ｄに示す一方向のみに伝播する。入力層６１ａのノード数は、例えば相互作用パターンのパターン数Ｎに対応している。 In the neural network 60, a plurality of nodes 62 arranged in layers are coupled only between adjacent layers. As a result, information propagates from the input-side layer 61a to the output-side layer 61b only in one direction indicated by the arrow D in the figure. The number of nodes in the input layer 61a corresponds to, for example, the number N of interaction patterns.

隣接する層間を結合する複数のノード６２間のそれぞれには、結合重みｗ（またはシナプスウェイトとも呼ぶ）と呼ばれる係数が設定されている。ニューラルネットワークの学習とは、訓練データを入力することにより、例えば誤差逆伝播法（バックプロパゲーション法）に基づくアルゴリズムを用いて、この結合重みｗを最適なものに更新する処理である。ニューラルネットワークによる深層学習は、例えば公知のソフトウェアツールキットを用いて行うことができる。 A coefficient called a connection weight w (or also called a synapse weight) is set between each of the plurality of nodes 62 that connect adjacent layers. The learning of the neural network is a process of updating the connection weight w to an optimum one by inputting training data, for example, using an algorithm based on an error back propagation method (back propagation method). The deep learning by the neural network can be performed using, for example, a known software tool kit.

まず、処理部１０Ａは、正例のみで構成されている複数の空間配置ベクトル５１と、正例および負例が混在している複数の予測立体構造ベクトル５３とを対比することにより、複数の予測立体構造ベクトル５３のそれぞれについて、ラベル値５４を判定して記録する。正例を示す予測立体構造ベクトル５３にはラベル値５４に値「１」を付して記録し、負例を示す予測立体構造ベクトル５３にはラベル値５４に値「０」を付して記録する。 First, the processing unit 10A compares a plurality of spatial arrangement vectors 51 configured by only positive examples with a plurality of predicted three-dimensional structure vectors 53 in which positive examples and negative examples are mixed to generate a plurality of predictions. For each of the three-dimensional structure vectors 53, a label value 54 is determined and recorded. The predicted three-dimensional structure vector 53 indicating the positive example is recorded with the value “1” added to the label value 54, and the predicted three-dimensional structure vector 53 indicating the negative example is recorded with the label value 54 added with the value “0”. To do.

次に、処理部１０Ａは、図９に示すように、深層学習アルゴリズムを構成するニューラルネットワーク構造６０の入力層６１ａに、空間配置ベクトル５１または予測立体構造ベクトル５３のベクトル情報を入力し、ニューラルネットワーク構造６０の出力層６１ｂに、入力層６１ａに入力したベクトル情報に対応するラベル値を入力する。 Next, as shown in FIG. 9, the processing unit 10A inputs the vector information of the spatial arrangement vector 51 or the predicted three-dimensional structure vector 53 to the input layer 61a of the neural network structure 60 constituting the deep learning algorithm, and the neural network A label value corresponding to the vector information input to the input layer 61a is input to the output layer 61b of the structure 60.

このような学習処理を、複数の空間配置ベクトル５１または予測立体構造ベクトル５３に対して再帰的に繰り返し行うことにより、結合重みｗを最適なものに更新する。最適化された結合重みｗとニューラルネットワーク６０とを含む学習済みの深層学習アルゴリズム１０８は、記録部１３に記録される。 Such a learning process is recursively repeated for a plurality of spatial arrangement vectors 51 or predicted three-dimensional structure vectors 53 to update the connection weight w to an optimum one. The learned deep learning algorithm 108 including the optimized connection weight w and the neural network 60 is recorded in the recording unit 13.

・結合性予測処理
図１０は、結合性予測装置１００Ｂの機能を説明するためのブロック図である。結合性予測装置１００Ｂの処理部１０Ｂは、予測対象取得部１１１と、立体構造取得部１１２と、予測構造生成部１１３と、予測ベクトル変換部１１４と、結合性予測部１１５と、を備える。これらの機能ブロックは、コンピュータに結合性予測処理を実行させるプログラムを、処理部１０Ｂの記録部１３にインストールし、このプログラムをＣＰＵ１１が実行することにより実現される。FIG. 10 is a block diagram for explaining functions of the connectivity predicting apparatus 100B. The processing unit 10B of the connectivity predicting apparatus 100B includes a prediction target acquiring unit 111, a three-dimensional structure acquiring unit 112, a predicted structure generating unit 113, a predicted vector converting unit 114, and a connectivity predicting unit 115. These functional blocks are realized by installing a program for causing the computer to execute the connectivity prediction process in the recording unit 13 of the processing unit 10B and executing the program by the CPU 11.

相互作用パターンデータベース１１０は、処理部１０Ｂの記録部１３に予め記録されている。結合性予測処理に使用する学習済みの深層学習アルゴリズム１０８は、深層学習装置１００Ａにおいて学習されたものを使用する。 The interaction pattern database 110 is recorded in advance in the recording unit 13 of the processing unit 10B. As the learned deep learning algorithm 108 used for the connectivity prediction process, an algorithm learned by the deep learning device 100A is used.

図１１は、結合性予測処理の手順を示すフローチャートであり、図１２は、結合性予測処理の詳細を説明するための模式図である。 FIG. 11 is a flowchart showing the procedure of the connectivity prediction process, and FIG. 12 is a schematic diagram for explaining the details of the connectivity prediction process.

結合性予測装置１００Ｂの処理部１０Ｂは、図１１に示す処理を行う。図１０に示す各機能ブロックを用いて説明すると、ステップＳ１１の処理は予測対象取得部１１１が行い、ステップＳ１２の処理は立体構造取得部１１２が行う。ステップＳ１３の処理は予測構造生成部１１３が行い、ステップＳ１４の処理は予測ベクトル変換部１１４が行う。ステップＳ１５の処理は結合性予測部１１５が行う。 The processing unit 10B of the connectivity predicting apparatus 100B performs the process shown in FIG. If it demonstrates using each function block shown in FIG. 10, the process of step S11 will be performed by the prediction object acquisition part 111, and the process of step S12 will be performed by the three-dimensional structure acquisition part 112. FIG. The process of step S13 is performed by the prediction structure generation unit 113, and the process of step S14 is performed by the prediction vector conversion unit 114. The connectivity predicting unit 115 performs the process of step S15.

ステップＳ１１において、処理部１０Ｂは、標的とするタンパク質の指定と、結合性を予測する対象である化合物の立体構造とを取得する。本実施形態では、結合性を予測する対象である化合物の立体構造は、実験的に確認されている立体構造である。 In step S 11, the processing unit 10 B acquires designation of a target protein and a three-dimensional structure of a compound that is a target for predicting binding properties. In the present embodiment, the three-dimensional structure of the compound for which the binding property is predicted is a three-dimensional structure that has been experimentally confirmed.

標的とするタンパク質の指定は、入力部１６を介してユーザから処理部１０Ｂに入力される。結合性の予測対象である化合物の立体構造も、例えば入力部１６を介してユーザから入力される。あるいは、結合性の予測対象である化合物の立体構造は予め記録部１３に記録されており、入力部１６を介してユーザが指定することにより、処理部１０Ｂでの処理に供される。 The designation of the target protein is input from the user to the processing unit 10B via the input unit 16. The three-dimensional structure of the compound that is the target of the binding property is also input from the user via the input unit 16, for example. Alternatively, the three-dimensional structure of the compound that is the target of predicting the binding property is recorded in the recording unit 13 in advance, and is supplied to the processing in the processing unit 10B when specified by the user via the input unit 16.

ステップＳ１２において、処理部１０Ｂは、立体構造データベース２１０から、ステップＳ１１において指定されたタンパク質の立体構造を取得する。 In step S12, the processing unit 10B acquires the three-dimensional structure of the protein specified in step S11 from the three-dimensional structure database 210.

ステップＳ１３において、処理部１０Ｂは、ステップＳ１２において取得した、標的とするタンパク質の立体構造と、ステップＳ１１において取得した、結合性を予測する対象である化合物の立体構造とに基づいて、タンパク質と化合物との複合体の予測立体構造を生成する。 In step S13, the processing unit 10B determines that the protein and the compound are based on the three-dimensional structure of the target protein acquired in step S12 and the three-dimensional structure of the compound that is the target for predicting the binding acquired in step S11. And generate a predicted three-dimensional structure of the complex.

深層学習処理を行う際のステップＳ５と同様に、本実施形態では、例えば分子モデリングシミュレーションソフトウェアであるAutoDockを用いて、複合体の立体構造を複数予測する。複合体の予測された立体構造は、所定の記述フォーマットで記録されている。 Similar to step S5 when performing deep learning processing, in this embodiment, a plurality of three-dimensional structures of the complex are predicted using, for example, AutoDock, which is molecular modeling simulation software. The predicted three-dimensional structure of the complex is recorded in a predetermined description format.

ステップＳ１４において、処理部１０Ｂは、ステップＳ１３において生成した複合体の予測立体構造を相互作用パターンデータベース１１０と照合し、複合体の予測立体構造を予測立体構造ベクトルへ変換する。 In step S14, the processing unit 10B compares the predicted three-dimensional structure of the complex generated in step S13 with the interaction pattern database 110, and converts the predicted three-dimensional structure of the complex into a predicted three-dimensional structure vector.

深層学習処理を行う際のステップＳ６と同様に、予測立体構造に示されている各リガンド原子のそれぞれについて、図８（ｂ）に例示する相互作用パターンデータベース１１０との照合処理を行うことにより、相互作用パターンとの照合結果を表す予測立体構造ベクトル５５を取得する。 Similarly to step S6 when performing the deep learning process, by performing a collation process with the interaction pattern database 110 illustrated in FIG. 8B for each of the ligand atoms shown in the predicted three-dimensional structure, A predicted three-dimensional structure vector 55 representing a result of matching with the interaction pattern is acquired.

処理部１０Ｂは、ステップＳ１３において得られた複合体の複数の立体構造のそれぞれについて、予測立体構造を予測立体構造ベクトル５５へ変換することにより、図１２に例示する複数の予測立体構造ベクトル５５を取得する。 The processing unit 10B converts the predicted three-dimensional structure into the predicted three-dimensional structure vector 55 for each of the plurality of three-dimensional structures of the complex obtained in step S13, thereby obtaining the plurality of predicted three-dimensional structure vectors 55 illustrated in FIG. get.

ステップＳ１５において、処理部１０Ｂは、ステップＳ１４において取得した予測立体構造ベクトル５５を、学習済みの深層学習アルゴリズム１０８に入力し、標的とするタンパク質の立体構造と、予測対象である化合物の立体構造との結合性を予測する。 In step S15, the processing unit 10B inputs the predicted three-dimensional structure vector 55 acquired in step S14 to the learned deep learning algorithm 108, the target protein three-dimensional structure, and the three-dimensional structure of the compound to be predicted. Predicts connectivity.

処理部１０Ｂは、学習済みの深層学習アルゴリズム１０８を構成するニューラルネットワーク構造６０の入力層６１ａに、予測立体構造ベクトル５５のベクトル情報を入力する。ニューラルネットワーク構造６０の出力層６１ｂには、入力層６１ａに入力したベクトル情報に対応するラベル値５６が出力される。ここで、ラベル値５６に値「１」が付されている予測立体構造ベクトル５５は、実際に存在することが可能であると判定されたリガンド原子の空間配置であり、ラベル値５６に値「０」が付されている予測立体構造ベクトル５５は、実際に存在することが不可能であると判定されたリガンド原子の空間配置である。 The processing unit 10B inputs the vector information of the predicted three-dimensional structure vector 55 to the input layer 61a of the neural network structure 60 constituting the learned deep learning algorithm 108. A label value 56 corresponding to the vector information input to the input layer 61a is output to the output layer 61b of the neural network structure 60. Here, the predicted three-dimensional structure vector 55 in which the value “1” is attached to the label value 56 is a spatial arrangement of ligand atoms determined to actually exist, and the value “ The predicted three-dimensional structure vector 55 to which “0” is attached is the spatial arrangement of ligand atoms determined to be impossible to actually exist.

処理部１０Ｂは、予測立体構造ベクトル５５とラベル値５６との複数のペアを含む予測結果１１６のうち、例えばラベル値５６に値「１」が付されているペアの予測立体構造ベクトル５５を、ユーザへの出力として出力部１７に出力する。あるいは、出力部１７への出力に代えて、予測結果１１６を記録部１３に記録してもよい。 The processing unit 10B, for example, of the prediction results 116 including a plurality of pairs of the predicted three-dimensional structure vector 55 and the label value 56, for example, the predicted three-dimensional structure vector 55 of the pair in which the value “1” is attached to the label value 56, It outputs to the output part 17 as an output to a user. Alternatively, the prediction result 116 may be recorded in the recording unit 13 in place of the output to the output unit 17.

以上、本発明によると、得られる予測精度が高く演算速度が向上した、結合性予測方法、装置、プログラム、記録媒体、および結合性の予測に使用する機械学習アルゴリズムの製造方法を提供することができる。 As described above, according to the present invention, it is possible to provide a connectivity prediction method, apparatus, program, recording medium, and method for manufacturing a machine learning algorithm used for connectivity prediction, with high prediction accuracy and improved calculation speed. it can.

標的とするタンパク質の立体構造と低分子化合物の立体構造との結合性を予測する従来の方法では、タンパク質および低分子化合物の配置の座標データから、量子化学計算により相互作用エネルギーを算出している。これに対し、本発明による結合性の予測方法では、空間内に位置する原子の配置パターンの確率分布との差（すなわち、相互作用パターンデータベースとの照合）によって予測を行うので、量子化学計算による複雑なエネルギー計算が不要となり、シミュレーション速度を向上させることができる。また、本発明による結合性の予測方法によると、相互作用パターンデータベースとの照合により、原子の空間配置パターンはベクトル化されるので、ニューラルネットワークを用いた深層学習アルゴリズムへの適用に適した形式となる。 In the conventional method for predicting the binding between the target protein's three-dimensional structure and the low-molecular compound's three-dimensional structure, the interaction energy is calculated by quantum chemical calculation from the coordinate data of the protein and low-molecular compound configuration. . On the other hand, in the method for predicting connectivity according to the present invention, prediction is performed based on the difference from the probability distribution of the arrangement pattern of atoms located in the space (that is, collation with the interaction pattern database). Complex energy calculation is not required, and simulation speed can be improved. Further, according to the connectivity prediction method of the present invention, the spatial arrangement pattern of atoms is vectorized by collating with the interaction pattern database, so that the format suitable for application to a deep learning algorithm using a neural network Become.

また、本発明による結合性の予測方法によると、AutoDock等の分子モデリングシミュレーションソフトウェアにより得られる予測構造を元に、さらに上記した確率分布による評価を加えることができるので、予測精度を向上させることができる。例示的には、予測精度は約９０％以上である。 In addition, according to the method for predicting connectivity according to the present invention, it is possible to add evaluation based on the above probability distribution based on the predicted structure obtained by molecular modeling simulation software such as AutoDock, so that the prediction accuracy can be improved. it can. Illustratively, the prediction accuracy is about 90% or more.

［付記事項］
以上、本発明を特定の実施の形態によって説明したが、本発明は上記した実施の形態に限定されるものではない。[Additional Notes]
As mentioned above, although this invention was demonstrated by specific embodiment, this invention is not limited to above-described embodiment.

上記実施の形態では、タンパク質を標的として化合物との結合性を予測しているが、タンパク質に代えて、核酸（ＤＮＡ、ＲＮＡ）、多糖等の生体高分子を使用してもよい。 In the above embodiment, the binding property to a compound is predicted using a protein as a target. However, a biopolymer such as a nucleic acid (DNA, RNA) or polysaccharide may be used instead of the protein.

上記実施の形態では、アミノ酸の周辺に位置するリガンド原子の空間配置分布の統計に基づいて相互作用パターンを定義しているが、リガンド原子の空間配置は、アミノ酸残基の周辺に位置するものに限られず、ヌクレオチド残基の周辺、単糖残基の周辺等に位置するものであってもよい。 In the above embodiment, the interaction pattern is defined based on the statistics of the spatial arrangement distribution of the ligand atoms located around the amino acids, but the spatial arrangement of the ligand atoms is located around the amino acid residues. It is not limited, and may be located around nucleotide residues, around monosaccharide residues, and the like.

上記実施の形態では、タンパク質の立体構造を蛋白質構造データバンクから取得しているが、タンパク質等の生体高分子の立体構造を蓄積したデータベースであれば、蛋白質構造データバンク以外の他のデータベースであってもよい。或いは、公共のデータベースである蛋白質構造データバンクを使用せず、例えば、自社内で研究段階にある未公開のタンパク質の立体構造を蓄積したプライベートなデータベースを、自社内または自研究所内のサーバに予め作成しておき、このプライベートなデータベースからタンパク質の立体構造を取得してもよい。 In the above embodiment, the three-dimensional structure of the protein is obtained from the protein structure data bank. However, any database other than the protein structure data bank may be used as long as it is a database that accumulates three-dimensional structures of biopolymers such as proteins. May be. Alternatively, without using the protein structure data bank, which is a public database, for example, a private database that stores the three-dimensional structure of unpublished proteins in the research stage in-house is stored in advance on a server in the company or in its own laboratory. A three-dimensional structure of the protein may be acquired from this private database.

上記実施の形態では、機械学習アルゴリズムとしてニューラルネットワーク構造の深層学習アルゴリズムを用いているが、機械学習アルゴリズムはこれに限定されず、例えばサポートベクターマシン、ランダムフォレスト等の種々の機械学習アルゴリズムを用いることができる。 In the above embodiment, a deep learning algorithm having a neural network structure is used as the machine learning algorithm. However, the machine learning algorithm is not limited to this, and various machine learning algorithms such as a support vector machine and a random forest are used. Can do.

上記実施の形態では、結合性を予測する対象である化合物の立体構造は、入力部１６を介してユーザから取得、または予め記録部１３に記録されている立体構造を使用しているが、結合性予測対象の化合物の立体構造を取得する態様はこれに限らず、例えば、入力部１６を介して対象とする化合物がユーザから指定され、蛋白質構造データバンク等の立体構造データベースから、指定に対応する化合物の立体構造を取得してもよい。 In the above embodiment, the three-dimensional structure of the compound for which the binding property is to be predicted is acquired from the user via the input unit 16 or is recorded in the recording unit 13 in advance. The mode of acquiring the three-dimensional structure of the target compound of the sex prediction is not limited to this. For example, the target compound is designated by the user via the input unit 16 and corresponds to the designation from the three-dimensional structure database such as the protein structure data bank. The three-dimensional structure of the compound to be obtained may be acquired.

上記実施の形態では、結合性を予測する対象である化合物の立体構造は、実験的に確認されている立体構造であるが、結合性を予測する対象である化合物の立体構造は、理論的に求められた立体構造であってもよい。 In the above embodiment, the three-dimensional structure of the compound for which the binding property is predicted is a three-dimensional structure that has been experimentally confirmed, but the three-dimensional structure of the compound for which the binding property is predicted is theoretically The required three-dimensional structure may be used.

上記実施の形態では、例えば分子モデリングシミュレーションソフトウェアであるAutoDockを用いて、複合体の立体構造を予測しているが、複合体の立体構造を予測するためのソフトウェアはAutoDockに限定されず、公知の種々の分子モデリングシミュレーションソフトウェアを用いることができる。 In the above embodiment, for example, the three-dimensional structure of a complex is predicted using AutoDock, which is molecular modeling simulation software. However, the software for predicting the three-dimensional structure of a complex is not limited to AutoDock, and is known in the art. Various molecular modeling simulation software can be used.

上記実施の形態では、処理部１０は一体の装置として実現されているが、処理部１０は一体の装置である必要はなく、ＣＰＵ１１、メモリ１２、記録部１３等が別所に配置され、これらがネットワークで接続されていてもよい。処理部１０と、入力部１６と、出力部１７とについても、一ヶ所に配置される必要は必ずしもなく、それぞれ別所に配置されて互いにネットワークで通信可能に接続されていてもよい。 In the above embodiment, the processing unit 10 is realized as an integrated device. However, the processing unit 10 does not have to be an integrated device, and the CPU 11, the memory 12, the recording unit 13, and the like are arranged in different places. It may be connected via a network. The processing unit 10, the input unit 16, and the output unit 17 are not necessarily arranged in one place, and may be arranged in different places and connected to each other via a network.

上記実施の形態では、処理部１０Ａ，１０Ｂの各機能ブロックは単一のＣＰＵ１１で実行されているが、これら各機能ブロックは単一のＣＰＵ１１で実行される必要は必ずしもなく、複数のＣＰＵで分散して処理されてもよい。 In the above embodiment, each functional block of the processing units 10A and 10B is executed by a single CPU 11. However, each functional block does not necessarily need to be executed by a single CPU 11, and is distributed by a plurality of CPUs. May be processed.

上記実施の形態では、深層学習処理１００Ａにおいて、相互作用パターンデータベース１１０および学習前または学習途中の深層学習アルゴリズムは、処理部１０Ａの記録部１３に予め記録されているが、これらは外部のサーバ（例えば、図１に示すアプリケーションサーバ３００）に記録されて、ネットワーク９９を介して処理部１０Ａに取り込まれてもよい。同様に、上記実施の形態では、結合性予測装置１００Ｂにおいて、相互作用パターンデータベース１１０および学習済みの深層学習アルゴリズム１０８は、処理部１０Ｂの記録部１３に予め記録されているが、これらも外部のサーバ（例えば、図１に示すアプリケーションサーバ３００）に記録されて、ネットワーク９９を介して処理部１０Ａに取り込まれてもよい。 In the above embodiment, in the deep learning process 100A, the interaction pattern database 110 and the deep learning algorithm before learning or during learning are recorded in advance in the recording unit 13 of the processing unit 10A. For example, it may be recorded in the application server 300) shown in FIG. 1 and taken into the processing unit 10A via the network 99. Similarly, in the above embodiment, in the connectivity predicting apparatus 100B, the interaction pattern database 110 and the learned deep learning algorithm 108 are recorded in advance in the recording unit 13 of the processing unit 10B. It may be recorded in a server (for example, the application server 300 shown in FIG. 1) and taken into the processing unit 10A via the network 99.

上記実施の形態では、入力部１６はキーボードまたはマウス等の入力装置で実現され、出力部１７は液晶ディスプレイ等の表示装置で実現されているが、入力部１６と出力部１７とを一体化してタッチパネル式の表示装置として構成してもよい。 In the above embodiment, the input unit 16 is realized by an input device such as a keyboard or a mouse, and the output unit 17 is realized by a display device such as a liquid crystal display, but the input unit 16 and the output unit 17 are integrated. You may comprise as a touchscreen type display apparatus.

１０（１０Ａ，１０Ｂ）処理部
１１ＣＰＵ
１２メモリ
１３記録部
１４バス
１５インタフェース部
１６入力部
１７出力部
４１アミノ酸原子
４２リガンド原子
５１空間配置ベクトル
５２ラベル値
５３予測立体構造ベクトル
５４ラベル値
５５予測立体構造ベクトル
５６ラベル値
６０ニューラルネットワーク
６１ａ入力層
６１ｂ出力層
６１ｃ中間層
６２ノード
９８記録媒体
９９ネットワーク
１００ユーザ側装置
１００Ａ深層学習装置
１００Ｂ結合性予測装置
１０１複合体取得部
１０２空間情報変換部
１０３空間ベクトル変換部
１０４複合体分割部
１０５予測構造生成部
１０６予測ベクトル変換部
１０７深層学習部
１０８深層学習アルゴリズム
１１０相互作用パターンデータベース
１１１予測対象取得部
１１２立体構造取得部
１１３予測構造生成部
１１４予測ベクトル変換部
１１５結合性予測部
１１６予測結果
２００データサーバ
２１０立体構造データベース
３００アプリケーションサーバ10 (10A, 10B) Processing unit 11 CPU
12 Memory 13 Recording unit 14 Bus 15 Interface unit 16 Input unit 17 Output unit 41 Amino acid atom 42 Ligand atom 51 Spatial arrangement vector 52 Label value 53 Predicted solid structure vector 54 Label value 55 Predicted solid structure vector 56 Label value 60 Neural network 61a Input Layer 61b Output layer 61c Intermediate layer 62 Node 98 Recording medium 99 Network 100 User side device 100A Deep learning device 100B Connectivity prediction device 101 Complex acquisition unit 102 Spatial information conversion unit 103 Spatial vector conversion unit 104 Complex division unit 105 Prediction structure Generation unit 106 Prediction vector conversion unit 107 Deep learning unit 108 Deep learning algorithm 110 Interaction pattern database 111 Prediction target acquisition unit 112 Three-dimensional structure acquisition unit 113 Prediction structure generation unit 114 Prediction vector Conversion unit 115 Connectivity prediction unit 116 Prediction result 200 Data server 210 Three-dimensional structure database 300 Application server

Claims

Translated fromJapanese

前記機械学習アルゴリズムの学習に用いられる訓練データが、残基周辺に位置するリガンド原子の空間配置分布の統計に基づいて定義された複数の相互作用パターンを含む相互作用パターンデータベースに基づいて生成されている、請求項１に記載の方法。 Training data used for learning of the machine learning algorithm is generated based on an interaction pattern database including a plurality of interaction patterns defined based on statistics of spatial arrangement distribution of ligand atoms located around residues. The method of claim 1.

前記相互作用パターンデータベースが、
前記立体構造データベースから、生体高分子とリガンドとの複合体の立体構造を取得するステップと、
前記立体構造データベースから取得した前記複合体の前記立体構造を、残基周辺に位置するリガンド原子の空間配置情報へ変換するステップと、
前記立体構造を取得するステップと、前記空間配置情報へ変換するステップとを繰り返し行うことにより、残基周辺に位置するリガンド原子の空間配置分布の統計を取得するステップと、
前記空間配置分布の統計に基づいて、複数の相互作用パターンを定義するステップと、
を含む方法により生成されている、請求項１または２に記載の方法。The interaction pattern database is
Obtaining a three-dimensional structure of a complex of a biopolymer and a ligand from the three-dimensional structure database;
Converting the three-dimensional structure of the complex obtained from the three-dimensional structure database into spatial arrangement information of ligand atoms located around the residue;
Obtaining statistics of spatial arrangement distribution of ligand atoms located in the vicinity of residues by repeatedly obtaining the three-dimensional structure and converting to the spatial arrangement information; and
Defining a plurality of interaction patterns based on statistics of the spatial distribution;
The method according to claim 1, wherein the method is produced by a method comprising:

請求項５に記載のプログラムを記録した、コンピュータ読み取り可能な非一時的な有体の記録媒体。 A computer-readable non-transitory tangible recording medium on which the program according to claim 5 is recorded.

前記機械学習アルゴリズムを学習させるステップが、前記空間配置ベクトルを正例として、前記予測立体構造ベクトルについて正例または負例を示すラベルを決定し、前記予測立体構造ベクトルを入力層とし前記ラベルを出力層として、機械学習アルゴリズムを学習させるステップである、請求項７に記載の機械学習アルゴリズムの製造方法。 The step of learning the machine learning algorithm determines a label indicating a positive example or a negative example for the predicted three-dimensional structure vector using the spatial arrangement vector as a positive example, and outputs the label using the predicted three-dimensional structure vector as an input layer The method of manufacturing a machine learning algorithm according to claim 7, which is a step of learning a machine learning algorithm as a layer.

前記相互作用パターンデータベースが、
前記立体構造データベースから、生体高分子とリガンドとの複合体の立体構造を取得するステップと、
前記立体構造データベースから取得した前記複合体の前記立体構造を、残基周辺に位置するリガンド原子の空間配置情報へ変換するステップと、
前記立体構造を取得するステップと、前記空間配置情報へ変換するステップとを繰り返し行うことにより、残基周辺に位置するリガンド原子の空間配置分布の統計を取得するステップと、
前記空間配置分布の統計に基づいて、複数の相互作用パターンを定義するステップと、
を含む方法により生成されている、請求項７または８に記載の機械学習アルゴリズムの製造方法。The interaction pattern database is
Obtaining a three-dimensional structure of a complex of a biopolymer and a ligand from the three-dimensional structure database;
Converting the three-dimensional structure of the complex obtained from the three-dimensional structure database into spatial arrangement information of ligand atoms located around the residue;
Obtaining statistics of spatial arrangement distribution of ligand atoms located in the vicinity of residues by repeatedly obtaining the three-dimensional structure and converting to the spatial arrangement information; and
Defining a plurality of interaction patterns based on statistics of the spatial distribution;
The method of manufacturing a machine learning algorithm according to claim 7 or 8, wherein the machine learning algorithm is generated by a method including:

請求項１１に記載のプログラムを記録した、コンピュータ読み取り可能な非一時的な有体の記録媒体。 A computer-readable non-transitory tangible recording medium on which the program according to claim 11 is recorded.