JP4580206B2

Movatterモバイル変換

Info

Publication number: JP4580206B2
Application number: JP2004296917A
Authority: JP
Inventors: 大亮西川; 公徳嶋本
Original assignee: NS Solutions Corp
Current assignee: NS Solutions Corp
Priority date: 2004-10-08
Filing date: 2004-10-08
Publication date: 2010-11-10
Anticipated expiration: 2024-10-08
Also published as: JP2006107395A

Description

本発明は、特定の疾患等に対して薬理活性を有する化合物（特定の疾患等に対する効き目がある化合物）を推定する化合物推定装置、化合物推定方法及びそのプログラムに関するものである。 The present invention relates to a compound estimation apparatus, a compound estimation method, and a program thereof for estimating a compound having pharmacological activity against a specific disease or the like (a compound having an effect on a specific disease or the like).

創薬スクリーニング技術の１つとして、例えば、化合物の３次元構造情報と薬理活性情報とをデータベースに登録しておき、ユーザが薬理活性の項目を指定することで、リード候補の化合物を抽出するという技術が開示されている（例えば、特許文献１を参照。）。 As one of drug discovery screening techniques, for example, the three-dimensional structure information and pharmacological activity information of a compound are registered in a database, and a user specifies a pharmacological activity item to extract lead candidate compounds. A technique is disclosed (see, for example, Patent Document 1).

特開２００２−７３６２６号公報JP 2002-73626 A

ところで、近年では化合物の構造情報や薬理活性情報や毒性情報などの情報だけでなく、細胞情報（細胞の遺伝子情報）と疾患との関連性に関するデータベースも充実して来ている。これにより、それらの情報を基に、化合物の薬理活性又は毒性ついて未知の細胞情報（遺伝子情報は分かっている）に対して、薬理活性又は毒性の有る化合物や、反対に薬理活性又は毒性の低い化合物を推定する技術についても望まれるようになっている。 By the way, in recent years, not only information such as structural information, pharmacological activity information, and toxicity information of compounds, but also databases relating to the relationship between cell information (gene information of cells) and diseases have been enhanced. Thus, based on such information, a compound having pharmacological activity or toxicity, or on the contrary, having low pharmacological activity or toxicity against unknown cell information (genetic information is known) about the pharmacological activity or toxicity of the compound. A technique for estimating a compound is also desired.

本発明は、上述した事情を考慮してなされたもので、化合物の薬理活性や毒性ついて未知の細胞情報に対して、薬理活性又は毒性の有る化合物又は、薬理活性又は毒性の低い化合物を推定することができる化合物推定装置、化合物推定方法及びそのプログラムを提供することを目的とする。 The present invention has been made in view of the above-described circumstances, and estimates a compound having pharmacological activity or toxicity or a compound having low pharmacological activity or toxicity with respect to unknown cell information about the pharmacological activity and toxicity of the compound. An object of the present invention is to provide a compound estimation device, a compound estimation method, and a program thereof.

この発明は、上述した課題を解決すべくなされたもので、本発明による化合物推定装置においては、細胞を特定する細胞特定情報に関連付けて、細胞特定情報で特定される細胞における遺伝子の発現パターンである第１の発現パターンを構成する各遺伝子の発現量を含む第１の発現情報と、細胞特定情報で特定される細胞に対する複数種類の化合物の薬理活性又は毒性に関する情報を含む化合物情報とを格納する関連情報格納手段と、化合物情報が未知である未知細胞における遺伝子の発現パターンである第２の発現パターンを構成する各遺伝子の発現量を含む第２の発現情報を得た場合に、第１の発現パターンと第２の発現パターンとの両方に共通に存在する遺伝子について、第１の発現情報に含まれる発現量と第２の発現情報に含まれる発現量とのユークリッド距離を、次元数により正規化することで距離を算出する距離算出手段と、距離算出手段が算出した未知細胞と各細胞の距離と、関連情報格納手段から参照する各細胞に対する化合物情報とを基に、未知細胞に対して薬理活性又は毒性を有する化合物又は、未知細胞に対して薬理活性又は毒性の低い化合物を推定する化合物推定手段とを具備することを特徴とする。The present invention has been made to solve the above-described problems. In the compound estimation device according to the present invention, the expression pattern of thegene in the cell specified by the cell specifying information is associated with the cell specifying information for specifying the cell. Stores first expression information including theexpression level ofeach gene constituting a certain first expression pattern, and compound information including information on the pharmacological activity or toxicity of a plurality of types of compounds with respect to the cell specified by the cell identification information. a related information storage means for, when a compound information to obtain a second expression information containing theexpression level of each gene constituting the second expression pattern which is the expression pattern of genes in unknown cells is unknown,the first For genes that are commonly present in both the expression pattern and the second expression pattern, the expression level included in the first expression information and the second expression information The Euclidean distance between the current quantity, a distance calculating means for calculating a distanceby normalizing the number of dimensions, and the distance of the unknown cells and each cell of the distance calculating means is calculated, for each cell reference from the related information storage means And a compound estimation means for estimating a compound having pharmacological activity or toxicity to unknown cells or a compound having low pharmacological activity or toxicity to unknown cells based on the compound information.

これにより、本発明による化合物推定装置は、化合物の薬理活性や毒性ついて未知の細胞情報（発現情報を含む細胞に関する情報）に対して、未知細胞と各細胞の距離と、各細胞に対する化合物情報とを基に、未知の細胞情報に対して薬理活性又は毒性の有る化合物又は、薬理活性又は毒性の低い化合物を推定することができる。ここで、薬理活性又は毒性の低い化合物とは、薬理活性値や毒性値が相対的に低い化合物や、薬理活性値や毒性値がある基準値より低い化合物のことを示す。As a result, the compound estimation apparatus according to the present invention providesthe distance between the unknown cell and each cell, the compound information for each cell, and theunknown cell information (information about the cell including expression information) about the pharmacological activity and toxicityof the compound. Based on the above, a compound having pharmacological activity or toxicity against unknown cell information or a compound having low pharmacological activity or toxicity can be estimated. Here, the compound having low pharmacological activity or toxicity refers to a compound having a relatively low pharmacological activity value or toxicity value or a compound having a pharmacological activity value or toxicity value lower than a certain reference value.

また、本発明による化合物推定装置の一態様例においては、上記化合物情報は、細胞特定情報で特定される細胞に対する複数種類の化合物の薬理活性又は毒性を数値化した薬理活性値又は毒性値を含む情報であり、化合物推定手段は、距離算出手段が算出した距離と、関連情報格納手段から参照する各細胞に対する化合物情報とを基に、未知細胞に対して化合物の薬理活性値又は毒性値を推定することにより、未知細胞に対して薬理活性又は毒性を有する化合物又は、未知細胞に対して薬理活性又は毒性の低い化合物を推定することを特徴とする。In one embodiment of the compound estimation apparatus according to the present invention, the compound information includes a pharmacological activity value or toxicity value obtained by quantifying the pharmacological activity or toxicity of a plurality of types of compounds with respect to the cell specified by the cell specifying information. The compound estimation means estimates the pharmacological activity value or toxicity value of the compound for unknown cells based on thedistance calculated by the distance calculation means and the compound information for each cell referenced from the related information storage means. Thus, a compound having pharmacological activity or toxicity to unknown cells or a compound having low pharmacological activity or toxicity to unknown cells is estimated.

また、本発明による化合物推定装置の一態様例においては、上記第１の発現情報を格納する発現情報データベースと、上記化合物情報を格納する化合物情報データベースと、発現情報データベースから第１の発現情報を参照して、化合物情報データベースから化合物情報を参照することで、関連情報格納手段に対して、細胞を特定する細胞特定情報に関連付けて、第１の発現情報と、化合物情報とを登録する情報登録手段とを更に具備することを特徴とする。 In one embodiment of the compound estimation apparatus according to the present invention, an expression information database for storing the first expression information, a compound information database for storing the compound information, and first expression information from the expression information database. By referring to the compound information from the compound information database, the information registration for registering the first expression information and the compound information in association with the cell specifying information for specifying the cell to the related information storage means And a means.

また、本発明による化合物推定装置の一態様例においては、上記距離算出手段は、第１の発現パターンと第２の発現パターンとの両方に共通に存在する遺伝子をγ（γ＝１，２…Γ）で識別し、第１の発現情報に含まれる発現量をｃγとし、第２の発現情報に含まれる発現量をｘγとし、距離をｄとしたときに、式（１）により上記距離を算出することを特徴とする。Further, in one embodiment of the compound estimation apparatus according to the present invention, thedistance calculation means calculates a gene that is commonly present in both the first expression pattern and the second expression pattern as γ (γ = 1, 2,... Γ), the expression level included in the first expression information is cγ, the expression level included in the second expression information is xγ, and the distance is d.It is characterized by calculating.

また、本発明による化合物推定装置の一態様例においては、上記化合物情報データベースは、化合物の構造に関する情報である構造情報を更に格納し、種々の情報を表示する表示手段と、化合物情報データベースより構造情報を参照して、化合物推定手段が推定した未知細胞に対して薬理活性又は毒性を有する化合物の構造又は、未知細胞に対して薬理活性又は毒性の低い化合物の構造を示す推定結果画面を表示手段に表示させる推定結果表示手段とを更に具備することを特徴とする。 Further, in one embodiment of the compound estimation apparatus according to the present invention, the compound information database further stores structure information that is information related to the structure of the compound, and includes a display means for displaying various information, and a structure from the compound information database. Referring to the information, a means for displaying an estimation result screen showing the structure of the compound having pharmacological activity or toxicity to the unknown cell estimated by the compound estimation means or the structure of the compound having low pharmacological activity or toxicity to the unknown cell And an estimation result display means to be displayed.

また、本発明による化合物推定方法においては、細胞を特定する細胞特定情報に関連付けて、細胞特定情報で特定される細胞における遺伝子の発現パターンである第１の発現パターンを構成する各遺伝子の発現量を含む第１の発現情報と、細胞特定情報で特定される細胞に対する複数種類の化合物の薬理活性又は毒性に関する情報を含む化合物情報とを格納する関連情報格納手段と、発現情報取得手段と、距離算出手段と、化合物推定手段とを備える化合物推定装置を用いた化合物推定方法であって、発現情報取得手段が、化合物情報が未知である未知細胞における遺伝子の発現パターンである第２の発現パターンを構成する各遺伝子の発現量を含む第２の発現情報を得る取得ステップと、距離算出手段が、第１の発現パターンと第２の発現パターンとの両方に共通に存在する遺伝子について、第１の発現情報に含まれる発現量と第２の発現情報に含まれる発現量とのユークリッド距離を、次元数により正規化することで距離を算出する距離算出ステップと、化合物推定手段が、距離算出ステップで算出した未知細胞と各細胞の距離と、関連情報格納手段から参照する各細胞に対する化合物情報とを基に、未知細胞に対して薬理活性又は毒性を有する化合物を推定する化合物推定ステップとを有することを特徴とする。In the compound estimation method according to the present invention,the expression level of each gene constituting the first expression pattern that is the expression pattern of thegene in the cell specified by the cell specification information in association with the cell specification information for specifying the cell Related information storage meansfor storing firstexpression information including information, and compound information including information relating to pharmacological activity or toxicity of a plurality of types of compounds for cells specified by the cell specification information, expression information acquisition means, and distance A compound estimation method using a compound estimation apparatus comprising acalculation means and a compound estimation means , wherein theexpression information acquisition means obtainsa second expression pattern that is a gene expression pattern in an unknown cell whose compound information is unknown. an acquisition step of obtaining a second expression information containing theexpression level of each gene constituting,distance calculating means, first expression pattern and the second expression For gene present in common to both the turn, it calculates the distanceby the Euclidean distance between the first expression expression amount included in information and expression level contained in the second expression information, normalized by the number of dimensions Pharmacological activity against unknown cells based on the distance calculation step, thecompound estimation means calculatesthe distance between the unknown cells calculated in the distance calculation step and each cell, and the compound information for each cell referenced from the related information storage means. Or a compound estimation step for estimating a compound having toxicity.

また、本発明によるプログラムは、コンピュータを、細胞を特定する細胞特定情報に関連付けて、細胞特定情報で特定される細胞における遺伝子の発現パターンである第１の発現パターンを構成する各遺伝子の発現量を含む第１の発現情報と、細胞特定情報で特定される細胞に対する複数種類の化合物の薬理活性又は毒性に関する情報を含む化合物情報とを格納する関連情報格納手段と、化合物情報が未知である未知細胞における遺伝子の発現パターンである第２の発現パターンを構成する各遺伝子の発現量を含む第２の発現情報を得た場合に、第１の発現パターンと第２の発現パターンとの両方に共通に存在する遺伝子について、第１の発現情報に含まれる発現量と第２の発現情報に含まれる発現量とのユークリッド距離を、次元数により正規化することで距離を算出する距離算出手段と、距離算出手段が算出した未知細胞と各細胞の距離と、関連情報格納手段から参照する各細胞に対する化合物情報とを基に、未知細胞に対して薬理活性又は毒性を有する化合物又は、未知細胞に対して薬理活性又は毒性の低い化合物を推定する化合物推定手段として機能させるプログラムである。In addition, the program according to the present invention relates to thecomputer with the cell specifying information for specifying the cell, and the expression level of each gene constituting the first expression pattern that is the expression pattern of the gene in the cell specified by the cell specifying information. Related information storage means for storing first expression information including information and compound information including information relating to the pharmacological activity or toxicity of a plurality of types of compounds with respect to the cell specified by the cell specifying information, and the compound information is unknown Common to both the first expression pattern and the second expression pattern when the second expression information including the expression level of each gene constituting the second expression pattern that is the expression pattern of the gene in the cell is obtained. The Euclidean distance between the expression level included in the first expression information and the expression level included in the second expression information is expressed by the number of dimensions. Based on the distance calculation means for calculating the distance by normalization, the distance between the unknown cell calculated by the distance calculation means and each cell, and the compound information for each cell referred from the related information storage means, It isa program thatfunctions as a compound estimation means for estimating a compound having pharmacological activity or toxicity or a compound having low pharmacological activity or toxicity to unknown cells .

本発明による化合物推定装置、化合物推定方法及びそのプログラムによれば、化合物の薬理活性又は毒性ついて未知の細胞情報に対して、薬理活性又は毒性の有る化合物や、薬理活性又は毒性の低い化合物を推定することができる。 According to the compound estimation apparatus, the compound estimation method and the program according to the present invention, a compound having pharmacological activity or toxicity or a compound having low pharmacological activity or toxicity is estimated with respect to unknown cell information about the pharmacological activity or toxicity of the compound. can do.

以下、本発明の実施の形態を説明する。
本発明の一実施形態における化合物推定装置は、特定の疾患等に対して薬理活性を有する化合物（特定の疾患等に対する効き目がある化合物）を推定する処理を行う装置であり、以下にその概略構成について説明を行う。図１は、本実施形態における化合物推定装置の概略構成を示す図である。Embodiments of the present invention will be described below.
The compound estimation device according to one embodiment of the present invention is a device that performs a process of estimating a compound having pharmacological activity against a specific disease or the like (a compound that has an effect on a specific disease or the like). Will be described. FIG. 1 is a diagram showing a schematic configuration of a compound estimation apparatus in the present embodiment.

図１において、１は、化合物推定装置であり、例えば癌細胞に対して薬理活性を有する化合物を推定する処理を行う。２は、ネットワークであり、例えばインターネットである。３は、ＮＣＩ（ＮａｔｉｏｎａｌＣａｎｃｅｒＩｎｓｔｉｔｕｔｅ）データベースであり、本実施形態で利用するＮＩＣが公開しているデータベースであり、具体的には、癌細胞の遺伝子発現パターンに関する情報である発現情報と、癌細胞に対する化合物の薬理活性値に関する情報である化合物情報とが少なくとも格納されているデータベースである。すなわち、化合物推定装置１は、ネットワーク２を介してＮＣＩデータベース３から、上述した発現情報及び化合物情報を取得して利用することで、薬理活性に関して未知の癌細胞に対して、薬理活性を有するであろう化合物を推定する処理を行う。尚、化合物推定装置１は、図示していないが、マウスやキーボードなどの入力装置および、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）や液晶ディスプレイなどの表示装置を具備する。 In FIG. 1, 1 is a compound estimation apparatus, for example, which performs a process of estimating a compound having pharmacological activity against cancer cells.Reference numeral 2 denotes a network, for example, the Internet.Reference numeral 3 denotes an NCI (National Cancer Institute) database, which is a database published by the NIC used in the present embodiment. Specifically, expression information that is information on gene expression patterns of cancer cells, and cancer cells Is a database in which at least compound information, which is information relating to the pharmacological activity value of a compound, is stored. That is, thecompound estimation apparatus 1 has pharmacological activity against cancer cells whose pharmacological activity is unknown by acquiring and using the expression information and compound information described above from the NCIdatabase 3 via thenetwork 2. A process to estimate the likely compound is performed. Although not shown, thecompound estimation device 1 includes an input device such as a mouse and a keyboard, and a display device such as a CRT (Cathode Ray Tube) and a liquid crystal display.

ここで、発現情報における、癌細胞の遺伝子発現パターンとは、複数種類の癌細胞毎に複数種類の遺伝子別の発現量（遺伝子が機能しているか否かを示す量）に関する情報である。すなわち、特定の癌細胞においては、特定の遺伝子の組合せ（遺伝子パターン）が発現している。また、化合物情報とは、複数種類の癌細胞毎に複数種類の化合物別の薬理活性値を示す情報である。尚、発現情報及び化合物情報については具体例を後述する。 Here, the gene expression pattern of cancer cells in the expression information is information relating to the expression level (amount indicating whether or not a gene is functioning) for each of a plurality of types of cancer cells. That is, specific gene combinations (gene patterns) are expressed in specific cancer cells. The compound information is information indicating the pharmacological activity value for each of a plurality of types of compounds for each of a plurality of types of cancer cells. Specific examples of expression information and compound information will be described later.

次に、化合物推定装置１の機能構成について説明する。１１は、制御部であり、化合物推定装置１内の各処理部やデータの流れの制御を行う。１２は、データベースであり、上述した発現情報を格納する発現情報データベース１２ａと、上述した化合物情報を格納する化合物情報データベース１２ｂと、上記発現情報と化合物情報から遺伝子の発現パターンと化合物の関連に関する情報である関連情報を格納する関連情報データベース１２ｃから構成される。 Next, the functional configuration of thecompound estimation apparatus 1 will be described.Reference numeral 11 denotes a control unit that controls each processing unit and data flow in thecompound estimation apparatus 1.Reference numeral 12 denotes a database, an expression information database 12a for storing the above-described expression information, a compound information database 12b for storing the above-described compound information, and information related to gene expression patterns and compound relationships from the expression information and compound information. It is comprised from the related information database 12c which stores the related information which is.

１３は、情報登録処理部であり、後述する送受信処理部１８及びネットワーク２を介してＮＣＩデータベース３から発現情報を取得して発現情報データベース１２ａに登録する処理と、ＮＣＩデータベース３から化合物情報を取得して化合物情報データベース１２ｂに登録する処理を行う。本実施形態における情報登録処理部１３は、ＮＣＩデータベース３から癌細胞の遺伝子発現パターンに関する情報であるＴ−Ｍａｔｒｉｘ（発現情報）を取得して、必要な情報を発現情報データベース１２ａに登録する。また、情報登録処理部１３は、ＮＣＩデータベース３から癌細胞の化合物に対する薬理活性値に関する情報であるＡ−Ｍａｔｒｉｘ（化合物情報）を取得して、必要な情報を化合物情報データベース１２ｂに登録する。 Reference numeral 13 denotes an information registration processing unit that acquires expression information from the NCIdatabase 3 through the transmission /reception processing unit 18 andnetwork 2 described later and registers the information in the expression information database 12a, and acquires compound information from the NCIdatabase 3. Then, a process of registering in the compound information database 12b is performed. The informationregistration processing unit 13 in the present embodiment acquires T-Matrix (expression information) that is information related to gene expression patterns of cancer cells from the NCIdatabase 3, and registers necessary information in the expression information database 12a. In addition, the informationregistration processing unit 13 acquires A-Matrix (compound information), which is information relating to the pharmacological activity value for compounds of cancer cells, from the NCIdatabase 3 and registers necessary information in the compound information database 12b.

ここで、上述した発現情報データベース１２ａ及び化合物情報データベース１２ｂに格納する発現情報及び化合物情報のデータ構成例を図２及び図３を用いて説明する。図２は、図１に示した発現情報データベース１２ａのデータ構成例を示す図である。図２において、ＣＬＩＤはＣｌｏｎｅＩＤから接頭辞“ＩＭＡＧＥ：”を抜いた数値であり、各遺伝子に固有の数値である。ＮＡＭＥはＣｌｏｎｅＩＤのｃＤＮＡ（Ｔｙｐｅ）に紐付く遺伝子名称ある。また、「ＭＥ：ＭＡＬＭＥ−３Ｍ」や「ＭＥ：ＳＫ−ＭＥＬ−２８」は、癌細胞の名称である。また、癌細胞の名称の下には各遺伝子に対する発現量が示されている。尚、これらのＣＬＩＤやＮＡＭＥはＮＣＩデータベース３から参照するＴ−Ｍａｔｒｉｘ（発現情報）で規定されている。また、図２に示す各遺伝子の発現量は、ＮＣＩデータベース３において６０種の中から代表的な７種の癌細胞を抜き出し、その平均と分散値で正規化した値である。 Here, a data configuration example of the expression information and the compound information stored in the expression information database 12a and the compound information database 12b described above will be described with reference to FIGS. FIG. 2 is a diagram showing a data configuration example of the expression information database 12a shown in FIG. In FIG. 2, CLID is a numerical value obtained by removing the prefix “IMAGE:” from Clone ID, and is a numerical value unique to each gene. NAME is a gene name associated with Clone ID cDNA (Type). “ME: MALME-3M” and “ME: SK-MEL-28” are names of cancer cells. Moreover, the expression level with respect to each gene is shown under the name of a cancer cell. Note that these CLIDs and NAMEs are defined by T-Matrix (expression information) referenced from the NCIdatabase 3. In addition, the expression level of each gene shown in FIG. 2 is a value obtained by extracting seven typical cancer cells from 60 types in theNCI database 3 and normalizing them with the average and variance value.

図３は、図１に示した化合物情報データベース１２ｂのデータ構成例を示す図である。
図３において、“ＮＳＣＮｏ．”は、化合物を特定する数値である。また、図２と同様に、「ＭＥ−ＭＡＬＭＥ−３Ｍ」や「ＭＥ−ＳＫ−ＭＥＬ−２８」などは、癌細胞の名称である。また、癌細胞の名称の下には各化合物に対する薬理活性値が示されている。この薬理活性値は、例えば化合物δの細胞ωに対する薬理活性値ａ（ω，δ）は以下の式１で算出される。FIG. 3 is a diagram showing a data configuration example of the compound information database 12b shown in FIG.
In FIG. 3, “NSC No.” is a numerical value that identifies a compound. Similarly to FIG. 2, “ME-MALME-3M”, “ME-SK-MEL-28”, and the like are names of cancer cells. Moreover, the pharmacological activity value with respect to each compound is shown under the name of a cancer cell. As for this pharmacological activity value, for example, the pharmacological activity value a (ω, δ) of the compound δ with respect to the cell ω is calculated by the followingformula 1.

上述した式１において、ＧＩ₅₀とは増殖抑制濃度であり、ここでは、癌細胞ωの増殖が５０％の確率で抑制される化合物δの濃度を意味する。ａ_averageとａ_sdはそれぞれ指定された化合物に対する癌細胞群の薬理活性値の平均と分散である。これにより、式１で求まる薬理活性値ａ（ω，δ）は、癌細胞ωに対する化合物δの増殖抑制の効果を意味し、化合物毎に正規化された値となる。In the above-describedformula 1, GI₅₀ is a growth inhibitory concentration, and here means the concentration of compound δ at which the growth of cancer cells ω is inhibited with a probability of 50%. a_average and a_sd are the mean and variance of the pharmacological activity value of the cancer cell group for the specified compound, respectively. Thus, the pharmacological activity value a (ω, δ) obtained byFormula 1 means the effect of inhibiting the growth of compound δ on cancer cells ω, and is a value normalized for each compound.

１４は、関連解析処理部であり、発現情報データベース１２ａから発現情報、化合物情報データベース１２ｂから化合物情報を参照して、同じ癌細胞における遺伝子発現パターンと薬理活性を有する化合物の関連に関する情報である関連情報を生成して、関連情報データベース１２ｃに登録する。具体的には、関連解析処理部１４は、上述した発現情報の癌細胞名と化合物情報の癌細胞名が同じものをキーに遺伝子発現パターンと薬理活性値を紐付けて、関連情報データベース１２ｃに登録する。この際、関連解析処理部１４は、本実施形態の化合物推定装置１では薬理活性があると期待できる化合物を推定する処理を行うので、薬理活性値の下限値εを設け、その下限値ε以下の薬理活性値を有する化合物については紐付け処理及び登録処理を行わない。 Reference numeral 14 denotes an association analysis processing unit, which refers to expression information from the expression information database 12a and compound information from the compound information database 12b, and is information relating to a relationship between a gene expression pattern and a compound having pharmacological activity in the same cancer cell. Information is generated and registered in the related information database 12c. Specifically, the associationanalysis processing unit 14 associates the gene expression pattern and the pharmacological activity value with the same cancer cell name in the expression information and the cancer cell name in the compound information as a key, and stores them in the associated information database 12c. sign up. At this time, the relatedanalysis processing unit 14 performs a process of estimating a compound that can be expected to have pharmacological activity in thecompound estimation device 1 of the present embodiment. For the compound having the pharmacological activity value, the linking process and the registration process are not performed.

１５は、距離算出処理部であり、薬理活性に関して未知の癌細胞の遺伝子発現パターンと、関連情報データベース１２ｃより参照する薬理活性に関して既知の癌細胞の遺伝子発現パターンとを基に、未知の癌細胞と既知の癌細胞との遺伝子発現パターンの距離を求める。尚、薬理活性に関して未知の癌細胞において、遺伝子の発現パターンは判明しており、その発現パターンに関する情報を距離算出処理部１５は取得しているとする。ここで、遺伝子発現パターンの距離とは、双方の癌細胞における遺伝子発現パターンの類似度を示す値であり、双方の癌細胞の発現情報を基に、共通する遺伝子の発現量を比較する（例えば、差分を取る）ことで、遺伝子発現パターンの類似度を算出する。 15 is a distance calculation processing unit, which is based on the gene expression pattern of an unknown cancer cell regarding pharmacological activity and the gene expression pattern of the known cancer cell regarding pharmacological activity referenced from the related information database 12c. And the distance between the gene expression pattern and the known cancer cell. It is assumed that the gene expression pattern is known in cancer cells whose pharmacological activity is unknown, and the distancecalculation processing unit 15 acquires information on the expression pattern. Here, the distance between gene expression patterns is a value indicating the similarity of gene expression patterns in both cancer cells, and the expression levels of common genes are compared based on the expression information of both cancer cells (for example, The degree of similarity of the gene expression pattern is calculated.

具体的には、距離算出処理部１５は、例えば薬理活性に関して未知の癌細胞χの遺伝子発現パターンＸが観測された場合には、薬理活性に関して既知の癌細胞ωの遺伝子発現パターンＣとの距離ｄ（ω，χ）は以下の式２及び式３により求めることができる。 Specifically, the distancecalculation processing unit 15, for example, when a gene expression pattern X of an unknown cancer cell χ regarding pharmacological activity is observed, is a distance from the gene expression pattern C of a known cancer cell ω regarding pharmacological activity. d (ω, χ) can be obtained by the followingequations 2 and 3.

式２及び式３において、γは、発現情報データベース１２ａに格納する発現情報に含まれるＣＬＩＤのいずれかであって、癌細胞ωとχの両方に発現量が存在する遺伝子を特定するＣＬＩＤである。Γは、癌細胞ωとχの両方に発現量が存在する遺伝子を示すＣＬＩＤの集合である。すなわち、γは集合Γの内のいずれかのＣＬＩＤである。また、ｃγは、ＣＬＩＤを示すγで特定される遺伝子であって、遺伝子発現パターンＣに含まれる遺伝子の癌細胞ωにおける発現量を示す。ｘγは、ＣＬＩＤを示すγで特定される遺伝子であって、遺伝子発現パターンＸに含まれる遺伝子の癌細胞χにおける発現量を示す。 InExpression 2 andExpression 3, γ is any CLID included in the expression information stored in the expression information database 12a, and is a CLID that identifies a gene whose expression level exists in both cancer cells ω and χ. . Γ is a set of CLIDs indicating genes whose expression levels exist in both cancer cells ω and χ. That is, γ is any CLID in the set Γ. Cγ is a gene specified by γ indicating CLID, and indicates the expression level of a gene included in gene expression pattern C in cancer cells ω. xγ is a gene specified by γ indicating CLID, and indicates the expression level of the gene included in the gene expression pattern X in the cancer cell χ.

上述した式２及び式３で求める距離ｄ（ω，χ）は、癌細胞ω及びχの遺伝子発現パターンＣ及びＸの内、両方に共通に存在する遺伝子の発現量（ｃγ及びｘγ）の距離としてユークリッド距離を求めて、次元数により正規化した値である。
次元数で正規化する理由は、細胞毎に発現データの存在する遺伝子数が大きく異なるからである。なお、距離算出処理部１５は、遺伝子発現パターンＸと遺伝子発現パターンＣとの間に共通する遺伝子が存在しない場合、すなわち｜Γ｜＝０の場合は距離が算出できないためその癌細胞ωを無視する。The distance d (ω, χ) obtained by the above-describedequations 2 and 3 is the distance between the expression levels (cγ and xγ) of genes that are commonly present in both of the gene expression patterns C and X of the cancer cells ω and χ. Is a value obtained by obtaining the Euclidean distance and normalizing by the number of dimensions.
The reason for normalizing by the number of dimensions is that the number of genes in which expression data exists varies greatly from cell to cell. The distancecalculation processing unit 15 ignores the cancer cell ω because the distance cannot be calculated when there is no common gene between the gene expression pattern X and the gene expression pattern C, that is, when | Γ | = 0. To do.

１６は、化合物推定処理部であり、癌細胞ωに関連する化合物δの未知の癌細胞χに対する薬理活性値を距離算出処理部１５が求めた距離を用いて推定し、推定した薬理活性値を基に、未知の癌細胞χに対する化合物δの薬理活性の強さを推定する活性ポイントを求める。 Reference numeral 16 denotes a compound estimation processing unit that estimates the pharmacological activity value of the compound δ related to the cancer cell ω with respect to an unknown cancer cell χ using the distance obtained by the distancecalculation processing unit 15, and calculates the estimated pharmacological activity value. Based on this, an activity point for estimating the strength of the pharmacological activity of compound δ against unknown cancer cells χ is determined.

具体的には、化合物推定処理部１６は、関連情報データベース１２ｃを参照して癌細胞ωに関連する化合物δを特定して、化合物δの未知の癌細胞χに対する薬理活性値ｅ（ω，χ，δ）を距離算出処理部１５が求めた距離ｄ（ω，χ）を用いて、以下の式４を計算することにより推定する。すなわち、例えば特定の癌細胞ω１に対して薬理活性値の高い化合物δ１がある場合に、特定の癌細胞ω１と癌細胞χの遺伝子発現パターンが類似していればいるほど、化合物推定処理部１６は、化合物δ１の癌細胞χに対する薬理活性値も高いと推定する。 Specifically, the compoundestimation processing unit 16 refers to the related information database 12c to identify the compound δ related to the cancer cell ω, and the pharmacological activity value e (ω, χ for the unknown cancer cell χ of the compound δ. , Δ) is estimated by calculating the followingExpression 4 using the distance d (ω, χ) obtained by the distancecalculation processing unit 15. That is, for example, when there is a compound δ1 having a high pharmacological activity value with respect to a specific cancer cell ω1, the more similar the gene expression pattern of the specific cancer cell ω1 and the cancer cell χ, the more the compoundestimation processing unit 16 Presumably has a high pharmacological activity value of compound δ1 against cancer cells χ.

上述した式４において、α及びβは、距離と薬理活性値の影響度合いを決定するパラメータである。αは推定する薬理活性値の値域に影響を与え、βは値を大きくとることで細胞間の類似性の評価を厳しくする働きがある。 InEquation 4 described above, α and β are parameters that determine the distance and the degree of influence of the pharmacological activity value. α influences the range of the estimated pharmacological activity value, and β has a function of making evaluation of similarity between cells strict by taking a large value.

次に、化合物推定処理部１６は、推定した薬理活性値ｅ（ω，χ，δ）を基に、以下の式５及び式６を用いて未知の癌細胞χに対する化合物δの活性ポイントｐ（χ，δ）を求める。 Next, the compoundestimation processing unit 16 uses the following formulas 5 and 6 based on the estimated pharmacological activity value e (ω, χ, δ), and the activity point p ( χ, δ) is obtained.

上述した式５及び式６において、Ξは、発現情報データベース１２ａに格納される全癌細胞ωの集合Ωにおいて癌細胞χとの距離ｄ（ω，χ）が算出できる癌細胞ωの部分集合（上記｜Γ｜＝０でない癌細胞ωの集合）である。この式５の計算により、最終的な活性ポイントｐ（χ，δ）は、推定した薬理活性値ｅ（ω，χ，δ）の化合物δ別の平均値となる。化合物推定処理部１６は、化合物δ別の活性ポイントｐ（χ，δ）を降順に並べて上位５０個の化合物識別情報（“ＮＳＣＮｏ．”）と活性ポイントの組合せを推定結果として出力する。 In Equation 5 and Equation 6 described above, Ξ is a subset of cancer cells ω that can calculate the distance d (ω, χ) from the cancer cells χ in the set Ω of all cancer cells ω stored in the expression information database 12a ( (A set of cancer cells ω other than | Γ | = 0). According to the calculation of Equation 5, the final active point p (χ, δ) is an average value for each compound δ of the estimated pharmacological activity value e (ω, χ, δ). The compoundestimation processing unit 16 arranges the active points p (χ, δ) for each compound δ in descending order, and outputs a combination of the top 50 compound identification information (“NSC No.”) and the active point as an estimation result.

１７は、結果表示処理部であり、化合物推定処理部１６が出力する推定結果を基に、化合物情報データベース１２ｂから化合物δの構造に関する情報を取得して、推定結果画面を化合物推定装置１の表示装置に表示する。図４は、結果表示処理部１７が、化合物推定装置１の表示装置に表示する推定結果画面例を示す図である。図４においては、化合物推定処理部１６が求めた活性ポイントｐ（χ，δ）の上位５０件中の上位１２件の化合物δに関する情報を推定結果として表示している。図４に示すように、化合物δに関する画面情報は、化合物の構造を図示する図示エリア４１と、化合物の名称を記載する行４２と、化合物を特定する情報である“ＮＳＣＮｏ．”を記載する行４３と、当該化合物の癌細胞χに対する活性ポイントを記載する行４４とから構成される。尚、本実施形態においては、現状のＮＣＩデータベース３で公開されている化合物構造情報には化合物名が付与されていないため、化合物名を記載する行４２にも、“ＮＳＣＮｏ．”を記載している。もちろん、化合物名に関する情報も化合物情報データベース１２ｂに格納することができた場合には、行４２には化合物名を記載する。 Reference numeral 17 denotes a result display processing unit that acquires information on the structure of the compound δ from the compound information database 12b based on the estimation result output from the compoundestimation processing unit 16, and displays the estimation result screen on thecompound estimation device 1. Display on the device. FIG. 4 is a diagram illustrating an example of an estimation result screen displayed on the display device of thecompound estimation device 1 by the resultdisplay processing unit 17. In FIG. 4, information on the top 12 compounds δ among the top 50 active points p (χ, δ) obtained by the compoundestimation processing unit 16 is displayed as an estimation result. As shown in FIG. 4, the screen information related to the compound δ includes agraphic area 41 illustrating the structure of the compound, arow 42 describing the name of the compound, and “NSC No.” which is information identifying the compound.Line 43 andline 44 describing the active point of the compound against cancer cells χ. In the present embodiment, since the compound name is not given to the compound structure information disclosed in thecurrent NCI database 3, “NSC No.” is also described in theline 42 describing the compound name. ing. Of course, if the information on the compound name can also be stored in the compound information database 12b, the compound name is described in therow 42.

１８は、送受信処理部であり、ネットワーク２を介してＮＣＩデータベース３と通信を行う。尚、本実施形態の化合物推定装置１においては、外部にあるＮＣＩデータベース３に格納されるデータを利用するため、ネットワーク２に接続する機能を有しているが、この限りではなく、外部のデータベースを利用することなく、例えば入力手段から内部のデータベース１２に予め発現情報や化合物情報を登録して格納していてもよい。この場合には、化合物推定装置１は、ネットワーク２に接続するための機能を必要としない。 A transmission /reception processing unit 18 communicates with theNCI database 3 via thenetwork 2. In addition, in thecompound estimation apparatus 1 of this embodiment, in order to use the data stored in theexternal NCI database 3, it has the function to connect to thenetwork 2, but it is not limited to this. For example, expression information and compound information may be registered and stored in advance in theinternal database 12 from the input means. In this case, thecompound estimation apparatus 1 does not need a function for connecting to thenetwork 2.

次に、図１に示した化合物推定装置１における癌細胞に有効な化合物の推定処理について、具体例を示して説明する。図５は、図１に示した化合物推定装置１における癌細胞に有効な化合物の推定処理を示す図である。尚、図５の処理を説明するに当たり、具体例として、癌細胞の一種であるＭＥＬ−ＵＡＣＣ−２５７を薬理活性のある化合物が未知であり、遺伝子発現パターンが既知である癌細胞と仮定する。尚、実際には癌細胞「ＭＥＬ−ＵＡＣＣ−２５７」に対しては図６に示すようなＢｅｎｚｏｔｈｉｏｐｈｅｎｅｄｉｏｎｅの構造をもつ化合物が薬理活性を有することが分かっている。すなわち、癌細胞「ＭＥＬ−ＵＡＣＣ−２５７」の遺伝子発現パターンから図６に示すＢｅｎｚｏｔｈｉｏｐｈｅｎｅｄｉｏｎｅの構造をもつ化合物を推定できれば、本実施形態における化合物推定装置１は、適正な化合物を推定できているといえる。 Next, the estimation process of a compound effective for cancer cells in thecompound estimation apparatus 1 shown in FIG. 1 will be described with a specific example. FIG. 5 is a diagram showing an estimation process of a compound effective for cancer cells in thecompound estimation apparatus 1 shown in FIG. In describing the processing of FIG. 5, as a specific example, it is assumed that MEL-UACC-257, which is a type of cancer cell, is a cancer cell whose pharmacologically active compound is unknown and whose gene expression pattern is known. In fact, it has been found that a compound having a Benzothiopheneion structure as shown in FIG. 6 has pharmacological activity against cancer cell “MEL-UACC-257”. That is, if the compound having the structure of Benzothiopheneion shown in FIG. 6 can be estimated from the gene expression pattern of the cancer cell “MEL-UACC-257”, it can be said that thecompound estimation apparatus 1 in the present embodiment can estimate an appropriate compound. .

図５に示すように、ステップＳ１において、情報登録処理部１３は、ネットワーク２を介してＮＣＩデータベース３から発現情報及び化合物情報を取得し、それぞれ発現情報データベース１２ａ及び化合物情報データベース１２ｂに登録する。具体的には、情報登録処理部１３は、ＮＣＩデータベース３から発現情報として６０種の癌細胞に対する４４６３種の化合物（データが存在するのは４４４４種）の薬理活性値を含むデータテーブルであるＡ−Ｍａｔｒｉｘを取得して、発現情報データベース１２ａに登録する。 As shown in FIG. 5, in step S1, the informationregistration processing unit 13 acquires expression information and compound information from theNCI database 3 via thenetwork 2, and registers them in the expression information database 12a and the compound information database 12b, respectively. Specifically, the informationregistration processing unit 13 is a data table including the pharmacological activity values of 4463 compounds (data is present 4444) against 60 types of cancer cells as expression information from theNCI database 3. -Obtain Matrix and register it in the expression information database 12a.

また、情報登録処理部１３は、６０種の癌細胞に対する９７０４種の遺伝子（データが存在するのは９０７３種）の発現量を含むデータテーブルであるＴ−Ｍａｔｒｉｘを取得して、化合物情報データベース１２ｂに登録する。但し、情報登録処理部１３は、Ｔ−Ｍａｔｒｉｘと上記Ａ−Ｍａｔｒｉｘとでは同一の癌細胞における細胞名の表記法が異なるのでどちらかの細胞名に統一する変換を行う（例：ＭＥ：ＭＡＬＭＥ−３Ｍ → ＭＥＬ−ＭＡＬＭＥ−３Ｍ）。また、発現情報データベース１２ａ及び化合物情報データベース１２ｂに登録した癌細胞のデータの内、本実施形態では未知の癌細胞と仮定した癌細胞ＭＥＬ−ＵＡＣＣ−２５７（癌細胞χ）を除いた５９種の癌細胞（癌細胞ω）について、以降の処理を行う。 Further, the informationregistration processing unit 13 acquires T-Matrix, which is a data table including the expression levels of 9704 genes (9073 types for which data exists) for 60 types of cancer cells, and obtains the compound information database 12b. Register with. However, since the notation of the cell name in the same cancer cell is different between the T-Matrix and the A-Matrix, the informationregistration processing unit 13 performs conversion to unify either cell name (for example, ME: MALME- 3M → MEL-MALME-3M). Of the cancer cell data registered in the expression information database 12a and the compound information database 12b, 59 types of cancer cells except for the cancer cell MEL-UACC-257 (cancer cell χ) assumed to be unknown cancer cells in the present embodiment. The subsequent processing is performed on the cancer cell (cancer cell ω).

また、情報登録処理部１３は、ＮＣＩデータベース３から４４６３種の化合物の構造情報を取得して、化合物情報データベース１２ｂに登録する。 Further, the informationregistration processing unit 13 acquires the structure information of 4463 kinds of compounds from theNCI database 3 and registers it in the compound information database 12b.

次に、ステップＳ２において、関連解析処理部１４は、上述した発現情報の癌細胞名と化合物情報の癌細胞名が同じものをキーに遺伝子発現パターンと薬理活性値を紐付けて、関連情報データベース１２ｃに登録する。この時、関連解析処理部１４は、薬理活性値の下限値ε＝１．０を設け、その下限値ε以下の薬理活性値を有する化合物については紐付け処理及び登録処理を行わない。 Next, in step S2, the associationanalysis processing unit 14 associates the gene expression pattern and the pharmacological activity value with the same cancer cell name in the expression information and the cancer cell name in the compound information as a key, and the associated information database. 12c is registered. At this time, the relatedanalysis processing unit 14 sets the lower limit value ε = 1.0 of the pharmacological activity value, and does not perform the linking process and the registration process for the compound having the pharmacological activity value equal to or lower than the lower limit value ε.

次に、ステップＳ３において、距離算出処理部１５は、未知の癌細胞χと薬理活性に関して既知の癌細胞ωとの遺伝子発現パターンの距離ｄ（ω，χ）を上述した式２及び式３により求める。尚、本実施形態では、癌細胞「ＭＥＬ−ＵＡＣＣ−２５７」を、化合物情報が未知であると仮定して処理したので、未知の癌細胞「ＭＥＬ−ＵＡＣＣ−２５７」の遺伝子発現パターンに関する情報は発現情報データベース１２ａから参照できたが、発現情報データベース１２ａに未知の癌細胞の遺伝子発現パターンが格納されていない場合には、図示していない入力装置からの入力したりネットワーク２を介して受信することなどにより、未知の癌細胞の遺伝子発現パターンを取得する必要がある。 Next, in step S3, the distancecalculation processing unit 15 calculates the distance d (ω, χ) of the gene expression pattern between the unknown cancer cell χ and the known cancer cell ω with respect to the pharmacological activity according to theabove formulas 2 and 3. Ask. In this embodiment, since the cancer cell “MEL-UACC-257” was processed on the assumption that the compound information is unknown, information on the gene expression pattern of the unknown cancer cell “MEL-UACC-257” is If the gene expression pattern of an unknown cancer cell is not stored in the expression information database 12a, it can be referred to from the expression information database 12a, but it can be input from an input device (not shown) or received via thenetwork 2. Therefore, it is necessary to obtain a gene expression pattern of an unknown cancer cell.

次に、ステップＳ４において、化合物推定処理部１６は、関連情報データベース１２ｃを参照して癌細胞ωに紐付けられた化合物δの未知の癌細胞χに対する薬理活性値ｅ（ω，χ，δ）を距離算出処理部１５が求めた距離ｄ（ω，χ）と、上述した式４を用いて計算することにより推定する。尚、本実施形態における化合物推定処理部１６は、式４の係数α＝９、β＝１０と設定する。 Next, in step S4, the compoundestimation processing unit 16 refers to the related information database 12c, and the pharmacological activity value e (ω, χ, δ) for the unknown cancer cell χ of the compound δ linked to the cancer cell ω. Is estimated by calculating using the distance d (ω, χ) obtained by the distancecalculation processing unit 15 and the above-describedexpression 4. Note that the compoundestimation processing unit 16 in this embodiment sets the coefficients α = 9 and β = 10 inEquation 4.

次に、ステップＳ５において、化合物推定処理部１６は、ステップＳ４で推定した薬理活性値ｅ（ω，χ，δ）を基に、上述した式５及び式６を用いて未知の癌細胞χに対する化合物δの活性ポイントｐ（χ，δ）を求める。本実施形態の化合物推定処理部１６は、求めた活性ポイントを降順に並べて上位５０件の化合物識別情報（“ＮＳＣＮｏ．”）と活性ポイントの組合せを推定結果として出力する。すなわち、化合物推定処理部１６は、活性ポイントの高い化合物を、癌細胞χに対して薬理活性を有する化合物と推定して出力している。 Next, in step S5, the compoundestimation processing unit 16 uses the above-described formulas 5 and 6 based on the pharmacological activity value e (ω, χ, δ) estimated in step S4 to deal with unknown cancer cells χ. The active point p (χ, δ) of compound δ is determined. The compoundestimation processing unit 16 of the present embodiment arranges the obtained active points in descending order and outputs the top 50 combinations of compound identification information (“NSC No.”) and active points as estimation results. That is, the compoundestimation processing unit 16 estimates and outputs a compound having a high activity point as a compound having pharmacological activity for cancer cells χ.

次に、ステップＳ６において、結果表示処理部１７は、結果表示処理部であり、化合物推定処理部１６が出力する推定結果を基に、化合物情報データベース１２ｂから化合物δの構造に関する情報を取得して、上位５０件の化合物の一覧となる図７に示すような推定結果画面を化合物推定装置１の表示装置に表示する。 Next, in step S6, the resultdisplay processing unit 17 is a result display processing unit, and acquires information on the structure of the compound δ from the compound information database 12b based on the estimation result output from the compoundestimation processing unit 16. Then, an estimation result screen as shown in FIG. 7 which is a list of the top 50 compounds is displayed on the display device of thecompound estimation device 1.

図７は、上述した具体的な処理フローにより、実際に推定された５０件の化合物の一覧表示例を示す図である。図７において、点線で囲んである化合物は、図６に示すＢｅｎｚｏｔｈｉｏｐｈｅｎｅｄｉｏｎｅの構造をもつ化合物であり、５０件中１０件が該当している。尚、推定の対象となった全化合物（４４４４種）の内、Ｂｅｎｚｏｔｈｉｏｐｈｅｎｅｄｉｏｎｅの構造をもつ化合物は２３件存在し、その存在確率は０．５２％（小数点第２以下四捨五入）であり、化合物推定装置１は、そのような存在確率に対して２０％の確率でＢｅｎｚｏｔｈｉｏｐｈｅｎｅｄｉｏｎｅの構造をもつ化合物を推定できている。すなわち、本実施形態における化合物推定装置１は、癌細胞「ＭＥＬ−ＵＡＣＣ−２５７」の遺伝子発現パターンから図６に示すＢｅｎｚｏｔｈｉｏｐｈｅｎｅｄｉｏｎｅの構造をもつ化合物を精度良く推定できる。また、この図７の結果は、本実施形態の化合物推定装置１における化合物の推定方法が有効であることを示している。 FIG. 7 is a diagram showing a list display example of 50 compounds actually estimated by the specific processing flow described above. In FIG. 7, compounds surrounded by a dotted line are compounds having the structure of Benzothiopheneion shown in FIG. Of all the compounds subject to estimation (4444 species), there are 23 compounds having the structure of Benzothiopheneion, and the existence probability is 0.52% (rounded to the second decimal place). 1 can estimate a compound having a Benzothiopheneion structure with a probability of 20% with respect to such existence probability. That is, thecompound estimation apparatus 1 in the present embodiment can accurately estimate a compound having the Benzothiopheneion structure shown in FIG. 6 from the gene expression pattern of the cancer cell “MEL-UACC-257”. Moreover, the result of this FIG. 7 has shown that the estimation method of the compound in thecompound estimation apparatus 1 of this embodiment is effective.

尚、図７に表示される化合物に関する情報の種類は、図４に示した情報の種類（化合物の構造、化合物名、ＮＳＣＮｏ．）と同様であり、表示している化合物の数が異なるのみである。また、化合物の表示方法は、図４や図７に示した表示方法に限定されるものではなく、少なくとも化合物が特定できる情報と、その化合物が癌細胞χに対してどれだけの薬理活性を有するか推定した値（上述の活性ポイント）とが表示される画面であればよい。 7 is the same as the information type (compound structure, compound name, NSC No.) shown in FIG. 4 except for the number of displayed compounds. It is. In addition, the display method of the compound is not limited to the display method shown in FIG. 4 or 7, and at least information that can identify the compound and how much pharmacological activity the compound has against the cancer cell χ. Any screen that displays the estimated value (the above-mentioned active points) may be used.

尚、上述した実施形態における化合物推定装置１は、癌細胞に対して薬理活性を有する化合物を推定する処理を行ったが、その他にも、健常細胞に対して毒性の少ない化合物を推定する処理などに応用して好適である。この場合には、例えば健常細胞の遺伝子発現パターンに関する情報を発現情報として、健常細胞に対する化合物の毒性値に関する情報を化合物情報とすればよい。 In addition, although thecompound estimation apparatus 1 in embodiment mentioned above performed the process which estimates the compound which has pharmacological activity with respect to a cancer cell, the process etc. which estimate a compound with little toxicity with respect to a healthy cell other than that It is suitable for application to. In this case, for example, information on the gene expression pattern of healthy cells may be used as expression information, and information on the toxicity value of the compound for healthy cells may be used as compound information.

また、上述した実施形態のように未知の癌細胞に対して薬理活性のある化合物を推定する場合には、癌細胞における遺伝子発現に関係する情報を発現情報として発現情報データベース１２ａに登録するが、未知の健常細胞に対して毒性のある化合物を推定する場合には、化合物情報データベース１２ｂには、健常細胞における遺伝子発現に関係する情報を発現情報として登録する。同様に、上述した実施形態のように未知の癌細胞に対して薬理活性のある化合物を推定する場合には、癌細胞に対して何らかの薬理活性が認められている化合物を化合物情報データベース１２ｂに登録するが、未知の健常細胞に対して毒性のある化合物を推定する場合には、化合物情報データベース１２ｂには、健常細胞に対して何らかの毒性が認められている化合物を登録する。これにより、関連解析処理部１４は、同一の健常細胞における発現情報と化合物情報を関連付けて関連情報データベース１２ｃに格納する。 Moreover, when estimating a compound having pharmacological activity against an unknown cancer cell as in the above-described embodiment, information related to gene expression in the cancer cell is registered as expression information in the expression information database 12a. When estimating a compound that is toxic to an unknown healthy cell, information related to gene expression in the healthy cell is registered as expression information in the compound information database 12b. Similarly, when estimating a compound having pharmacological activity against an unknown cancer cell as in the above-described embodiment, a compound having some pharmacological activity with respect to the cancer cell is registered in the compound information database 12b. However, when estimating a compound that is toxic to an unknown healthy cell, the compound information database 12b registers a compound that is recognized to be toxic to the healthy cell. Thereby, the associationanalysis processing unit 14 associates the expression information and the compound information in the same healthy cell and stores them in the association information database 12c.

また、上述した実施形態において、図１に示した化合物推定装置１の各処理部は、ハードウェアとしてはメモリ及びＣＰＵ（中央演算装置）により構成され、各処理部の機能を実現する為のプログラムをメモリに読み込んでＣＰＵが実行することによりその機能を実現させるものである。また、これに限定されるものではなく、各処理部の一部の処理又は全部の処理を専用のハードウェアにより実現されるものであってもよい。
また、上記メモリは、ハードディスク装置や光磁気ディスク装置、フラッシュメモリ等の不揮発性のメモリや、ＣＤ−ＲＯＭ等の読み出しのみが可能な記録媒体、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）のような揮発性のメモリ、あるいはこれらの組合せによるコンピュータ読み取り、書き込み可能な記録媒体より構成されるものとする。Further, in the above-described embodiment, each processing unit of thecompound estimation apparatus 1 shown in FIG. 1 includes a memory and a CPU (central processing unit) as hardware, and a program for realizing the function of each processing unit Is loaded into the memory and executed by the CPU to realize its function. In addition, the present invention is not limited to this, and part or all of the processing of each processing unit may be realized by dedicated hardware.
The memory includes a non-volatile memory such as a hard disk device, a magneto-optical disk device, and a flash memory, a recording medium such as a CD-ROM that can only be read, and a volatile memory such as a RAM (Random Access Memory). Or a computer-readable / writable recording medium based on a combination thereof.

また、図１に示した化合物推定装置１の各処理部は、上述したようにコンピュータがプログラムを実行することによって実現しているが、そのプログラムをコンピュータに供給するための手段、例えばかかるプログラムを記録したコンピュータ読み取り可能な記録媒体又はかかるプログラムを伝送する伝送媒体も本発明の実施形態として適用することができる。また、上記のプログラムを記録したコンピュータ読み取り可能な記録媒体等のプログラムプロダクトも本発明の実施形態として適用することができる。上記のプログラム、記録媒体、伝送媒体及びプログラムプロダクトは、本発明の範疇に含まれる。 Further, each processing unit of thecompound estimation apparatus 1 shown in FIG. 1 is realized by the computer executing the program as described above, but means for supplying the program to the computer, for example, such a program is provided. A recorded computer-readable recording medium or a transmission medium for transmitting such a program can also be applied as an embodiment of the present invention. A program product such as a computer-readable recording medium in which the above program is recorded can also be applied as an embodiment of the present invention. The above program, recording medium, transmission medium, and program product are included in the scope of the present invention.

また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system serving as a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding a program for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.

また、上記プログラムは、前述した機能の一部を実現する為のものであっても良い。さらに、前述した機能をコンピュータシステムに既に記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

本実施形態における化合物推定装置の概略構成を示す図である。It is a figure which shows schematic structure of the compound estimation apparatus in this embodiment.図１に示した発現情報データベース１２ａのデータ構成例を示す図である。It is a figure which shows the data structural example of the expression information database 12a shown in FIG.図１に示した化合物情報データベース１２ｂのデータ構成例を示す図である。It is a figure which shows the example of a data structure of the compound information database 12b shown in FIG.結果表示処理部１７が、化合物推定装置１の表示装置に表示する推定結果画面例を示す図である。It is a figure which shows the example of an estimation result screen which the resultdisplay process part 17 displays on the display apparatus of thecompound estimation apparatus 1. FIG.図１に示した化合物推定装置１における癌細胞に有効な化合物の推定処理を示す図である。It is a figure which shows the estimation process of the compound effective in a cancer cell in thecompound estimation apparatus 1 shown in FIG.Ｂｅｎｚｏｔｈｉｏｐｈｅｎｅｄｉｏｎｅの構造を示す図である。It is a figure which shows the structure of Benzothiophenedione.本実施形態による推定方法により実際に推定された５０件の化合物の一覧表示例を示す図である。It is a figure which shows the example of a list display of 50 compounds actually estimated with the estimation method by this embodiment.

符号の説明Explanation of symbols

１化合物推定装置
２ネットワーク
３ＮＣＩデータベース
１１制御部
１２データベース
１２ａ発現情報データベース
１２ｂ化合物情報データベース
１２ｃ関連情報データベース
１３情報登録処理部
１４関連解析処理部
１５距離算出処理部
１６化合物推定処理部
１７結果表示処理部
１８送受信処理部DESCRIPTION OFSYMBOLS 1Compound estimation apparatus 2Network 3NCI database 11Control part 12 Database 12a Expression information database 12b Compound information database 12cRelated information database 13 Informationregistration process part 14 Relatedanalysis process part 15 Distancecalculation process part 16 Compoundestimation process part 17 Resultdisplay process Unit 18 Transmission / Reception Processing Unit

Claims

Translated fromJapanese

前記化合物情報は、前記細胞特定情報で特定される細胞に対する複数種類の化合物の薬理活性又は毒性を数値化した薬理活性値又は毒性値を含む情報であり、
前記化合物推定手段は、前記距離算出手段が算出した前記距離と、前記関連情報格納手段から参照する各細胞に対する前記化合物情報とを基に、前記未知細胞に対して前記化合物の前記薬理活性値又は毒性値を推定することにより、前記未知細胞に対して前記薬理活性又は毒性を有する化合物又は、前記未知細胞に対して前記薬理活性又は毒性の低い化合物を推定すること
を特徴とする請求項１に記載の化合物推定装置。The compound information is information including a pharmacological activity value or toxicity value obtained by quantifying the pharmacological activity or toxicity of a plurality of types of compounds against the cell specified by the cell identification information,
The compounds estimating means, andthe distance that the distance calculating means is calculated, based on said compound information for each cell referenced from the related information storage unit, the pharmacological activity value of the compound against the unknown cell or The compound having a pharmacological activity or toxicity to the unknown cell or a compound having a low pharmacological activity or toxicity to the unknown cell is estimated by estimating a toxicity value. The compound estimation apparatus of description.

前記第１の発現情報を格納する発現情報データベースと、
前記化合物情報を格納する化合物情報データベースと、
前記発現情報データベースから前記第１の発現情報を参照して、前記化合物情報データベースから前記化合物情報を参照することで、前記関連情報格納手段に対して、前記細胞を特定する細胞特定情報に関連付けて、前記第１の発現情報と、前記化合物情報とを登録する情報登録手段と
を更に具備することを特徴とする請求項１又は２に記載の化合物推定装置。An expression information database for storing the first expression information;
A compound information database for storing the compound information;
By referring to the first expression information from the expression information database and referring to the compound information from the compound information database, the related information storage means is associated with cell specifying information for specifying the cell. The compound estimation apparatus according to claim 1, further comprising: information registration means for registering the first expression information and the compound information.

前記距離算出手段は、前記第１の発現パターンと前記第２の発現パターンとの両方に共通に存在する遺伝子をγ（γ＝１，２…Γ）で識別し、前記第１の発現情報に含まれる発現量をｃγとし、前記第２の発現情報に含まれる発現量をｘγとし、前記距離をｄとしたときに、式（１）により前記距離を算出することを特徴とする請求項１から３までのいずれか１項に記載の化合物推定装置。

The distance calculation means identifies a gene that is commonly present in both the first expression pattern and the second expression pattern by γ (γ = 1, 2,..., Γ), and uses the first expression information as the first expression information. 2. The distance is calculated according toequation (1), where cγ is an expression level included, xγ is an expression level included in the second expression information, and d is the distance. compound estimating apparatus according to any one ofup to 3.

前記化合物情報データベースは、前記化合物の構造に関する情報である構造情報を更に格納し、
種々の情報を表示する表示手段と、
前記化合物情報データベースより前記構造情報を参照して、前記化合物推定手段が推定した前記未知細胞に対して前記薬理活性又は毒性を有する化合物の構造又は、前記未知細胞に対して前記薬理活性又は毒性の低い化合物の構造を示す推定結果画面を前記表示手段に表示させる推定結果表示手段と
を更に具備することを特徴とする請求項３に記載の化合物推定装置。The compound information database further stores structure information that is information related to the structure of the compound,
Display means for displaying various information;
Referring to the structure information from the compound information database, the structure of the compound having the pharmacological activity or toxicity to the unknown cell estimated by the compound estimation means, or the pharmacological activity or toxicity to the unknown cell The compound estimation apparatus according to claim3 , further comprising: an estimation result display unit that displays an estimation result screen showing a structure of a low compound on the display unit.

細胞を特定する細胞特定情報に関連付けて、前記細胞特定情報で特定される細胞における遺伝子の発現パターンである第１の発現パターンを構成する各遺伝子の発現量を含む第１の発現情報と、前記細胞特定情報で特定される細胞に対する複数種類の化合物の薬理活性又は毒性に関する情報を含む化合物情報とを格納する関連情報格納手段と、発現情報取得手段と、距離算出手段と、化合物推定手段とを備える化合物推定装置を用いた化合物推定方法であって、
前記発現情報取得手段が、前記化合物情報が未知である未知細胞における遺伝子の発現パターンである第２の発現パターンを構成する各遺伝子の発現量を含む第２の発現情報を得る取得ステップと、
前記距離算出手段が、前記第１の発現パターンと前記第２の発現パターンとの両方に共通に存在する遺伝子について、前記第１の発現情報に含まれる発現量と前記第２の発現情報に含まれる発現量とのユークリッド距離を、次元数により正規化することで距離を算出する距離算出ステップと、
前記化合物推定手段が、前記距離算出ステップで算出した前記未知細胞と各細胞の前記距離と、前記関連情報格納手段から参照する各細胞に対する前記化合物情報とを基に、前記未知細胞に対して前記薬理活性又は毒性を有する化合物を推定する化合物推定ステップと
を有することを特徴とする化合物推定方法。First expression information includingan expression level ofeach gene constituting a first expression pattern, which is an expression pattern of a gene in a cell identified by the cell identification information, in association with cell identification information for identifying a cell; Related information storage means for storing compound information including information on the pharmacological activity or toxicity of a plurality of types of compounds with respect to the cells specified by the cell identification information, expression information acquisition means, distance calculation means, and compound estimation means A compound estimation method using a compound estimation apparatus comprising:
The step of obtaining the second expression information including theexpression level ofeach gene constituting the second expression pattern that is the expression pattern of thegene in an unknown cell in which the compound information is unknown;
The distance calculation means includes, in the expression level included in the first expression information and the second expression information, a gene that is commonly present in both the first expression pattern and the second expression pattern. A distance calculation step of calculating the distanceby normalizing the Euclidean distance from the expressed amount by the number of dimensions ;
The compound estimation unit is configured to calculate the unknown cell based on the distance between the unknown cell and each cell calculated in the distance calculation step, and the compound information for each cell referenced from the related information storage unit. And a compound estimation step for estimating a compound having pharmacological activity or toxicity.

コンピュータを、
細胞を特定する細胞特定情報に関連付けて、前記細胞特定情報で特定される細胞における遺伝子の発現パターンである第１の発現パターンを構成する各遺伝子の発現量を含む第１の発現情報と、前記細胞特定情報で特定される細胞に対する複数種類の化合物の薬理活性又は毒性に関する情報を含む化合物情報とを格納する関連情報格納手段と、
前記化合物情報が未知である未知細胞における遺伝子の発現パターンである第２の発現パターンを構成する各遺伝子の発現量を含む第２の発現情報を得た場合に、前記第１の発現パターンと前記第２の発現パターンとの両方に共通に存在する遺伝子について、前記第１の発現情報に含まれる発現量と前記第２の発現情報に含まれる発現量とのユークリッド距離を、次元数により正規化することで距離を算出する距離算出手段と、
前記距離算出手段が算出した前記未知細胞と各細胞の前記距離と、前記関連情報格納手段から参照する各細胞に対する前記化合物情報とを基に、前記未知細胞に対して前記薬理活性又は毒性を有する化合物又は、前記未知細胞に対して前記薬理活性又は毒性の低い化合物を推定する化合物推定手段と
して機能させるプログラム。Computer
First expression information including an expression level of each gene constituting a first expression pattern, which is an expression pattern of a gene in a cell identified by the cell identification information, in association with cell identification information for identifying a cell; Related information storage means for storing compound information including information relating to the pharmacological activity or toxicity of a plurality of types of compounds for the cells specified by the cell specifying information;
When the second expression information including the expression level of each gene constituting the second expression pattern that is a gene expression pattern in an unknown cell in which the compound information is unknown is obtained, the first expression pattern and the For genes that are common to both the second expression pattern, the Euclidean distance between the expression level included in the first expression information and the expression level included in the second expression information is normalized by the number of dimensions. A distance calculating means for calculating the distance by
The pharmacological activity or toxicity to the unknown cell based on the distance between the unknown cell and each cell calculated by the distance calculation means and the compound information for each cell referenced from the related information storage means A compound estimation means for estimating a compound or a compound having low pharmacological activity or toxicity against the unknown cell;
Programto make it work .