Movatterモバイル変換


[0]ホーム

URL:


CN119479958A - A clinical trial quality assessment method and system based on artificial intelligence - Google Patents

A clinical trial quality assessment method and system based on artificial intelligence
Download PDF

Info

Publication number
CN119479958A
CN119479958ACN202510062732.1ACN202510062732ACN119479958ACN 119479958 ACN119479958 ACN 119479958ACN 202510062732 ACN202510062732 ACN 202510062732ACN 119479958 ACN119479958 ACN 119479958A
Authority
CN
China
Prior art keywords
data
clinical
quality
test
artificial intelligence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510062732.1A
Other languages
Chinese (zh)
Inventor
黄明光
梁潇
于佳莉
张爱玲
庞浩鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Detong Xing Pharmaceutical Polytron Technologies Inc
Original Assignee
Beijing Detong Xing Pharmaceutical Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Detong Xing Pharmaceutical Polytron Technologies IncfiledCriticalBeijing Detong Xing Pharmaceutical Polytron Technologies Inc
Priority to CN202510062732.1ApriorityCriticalpatent/CN119479958A/en
Publication of CN119479958ApublicationCriticalpatent/CN119479958A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于人工智能的临床试验质量评估方法及系统,涉及临床试验质量评估技术领域。本发明通过一系列自动化的数据处理和模型评估流程,从数据采集到最终报告生成基本实现了自动化操作,无需人工对大量数据进行复杂的评估计算,大大提高了评估效率;通过依靠人工智能算法和基于规则的专家系统进行评估,有效减少了主观因素对评估结果的影响,提高了评估结果的一致性和准确性;通过人工智能算法处理多种类型的高维度数据,通过特征提取和筛选提取出关键特征子集,再利用训练好的深度学习模型进行评估,并通过不断优化模型性能来适应不断变化的大规模、高维度数据情况,有效满足了高效、精准评估的需求。

The present invention discloses a clinical trial quality assessment method and system based on artificial intelligence, and relates to the technical field of clinical trial quality assessment. The present invention basically realizes automated operation from data collection to final report generation through a series of automated data processing and model assessment processes, and does not require manual complex assessment calculations on a large amount of data, which greatly improves the assessment efficiency; by relying on artificial intelligence algorithms and rule-based expert systems for assessment, the influence of subjective factors on the assessment results is effectively reduced, and the consistency and accuracy of the assessment results are improved; by processing various types of high-dimensional data through artificial intelligence algorithms, extracting key feature subsets through feature extraction and screening, and then using trained deep learning models for assessment, and by continuously optimizing model performance to adapt to the ever-changing large-scale, high-dimensional data situation, the needs of efficient and accurate assessment are effectively met.

Description

Clinical trial quality assessment method and system based on artificial intelligence
Technical Field
The invention relates to the technical field of clinical trial quality evaluation, in particular to an artificial intelligence-based clinical trial quality evaluation method and system.
Background
Clinical trials have taken up important and indispensable roles in various fields such as medicine development process and determination of disease treatment schemes. It is just like a bridge, and connects the theoretical research of laboratory and the application transformation of clinical practice. The high-quality clinical test can clearly reveal the multidimensional information such as the curative effect difference, survival benefit, recurrence risk reduction degree and the like of a certain treatment scheme in a specific disease patient group, thereby providing scientific and reliable reference for a clinician to formulate a personalized and accurate treatment scheme and enabling the patient to accept the most suitable treatment means. The quality degree of the test bed is similar to that of a base stone in a high building, and the test bed directly plays a fundamental role in determining the reliability and the effectiveness of test results. The high-quality test can ensure that the acquired data are real, accurate and complete, thereby strongly supporting the subsequent statistical analysis and conclusion deduction based on the data, and ensuring that the test result has high reliability and repeatability. The traditional clinical test quality assessment mainly relies on manual auditing, and the method has a plurality of defects:
1. The manual evaluation efficiency is low, and a great deal of labor and time cost are consumed in the face of massive test data and complex evaluation standards;
2. The manual evaluation is easy to be interfered by subjective factors, different evaluation staff can obtain different conclusions on the same data, and consistency and accuracy of an evaluation result are difficult to ensure;
3. Along with the continuous expansion of the clinical test scale and the increasing of the data dimension, the traditional manual evaluation method is more difficult to meet the requirements of efficient and accurate evaluation;
Therefore, an artificial intelligence-based clinical trial quality evaluation method and system are provided.
Disclosure of Invention
The invention aims to provide an artificial intelligence-based clinical trial quality assessment method and system, which are used for solving one of the problems in the background technology.
In order to solve the technical problems, the application adopts a technical scheme that the clinical test quality assessment method based on artificial intelligence comprises the following steps:
step one, acquiring clinical test data, wherein the clinical test data are acquired from a clinical test information system of a medical institution, a medical database and a test data storage library of a scientific research institution;
Step two, constructing a clinical test data set, cleaning the clinical test data, removing noise data, error data and repeated data, and sorting and classifying according to a preset data format;
step three, extracting features from the clinical test data set based on an artificial intelligence algorithm, extracting key features, screening out feature subsets through a feature selection algorithm and a principal component analysis algorithm of mutual information, and acquiring quality evaluation labels of the clinical test data corresponding to the feature subsets;
Step four, constructing a deep learning model according to the feature subsets and quality evaluation labels of clinical test data corresponding to the feature subsets, and training the deep learning model;
Step five, evaluating the trained deep learning model by using a test data set, and optimizing the deep learning model according to an evaluation result;
Step six, obtaining a feature subset of clinical test data to be evaluated, and inputting the feature subset into a deep learning model to obtain a quality evaluation result;
and step seven, integrating the quality evaluation result, the depth evaluation and the verification result to generate a clinical test quality evaluation report.
As a further preferred aspect of the present invention, in step one, the clinical trial data includes patient basic information, medical history, treatment course data, examination test result data, trial results, adverse reaction records, and trial medication.
As a further preferred aspect of the present invention, in step two, the method of constructing a clinical trial data set comprises the steps of:
Step 1, removing abnormal values exceeding a normal physiological range or a preset test data range by utilizing a data range checking rule;
step 2, screening out data which does not accord with the specified format through data format verification, and correcting or marking;
Step 3, identifying and deleting completely repeated data entries by utilizing a data repeatability detection algorithm;
And 4, classifying and storing the clinical test data after cleaning according to the patient number, the test item number and the data type so as to construct a clinical test data set.
In the third step, the artificial intelligence algorithm comprises a convolutional neural network, a self-encoder neural network and a natural language processing technology, wherein the convolutional neural network is used for extracting the characteristics of image data in clinical test data sets, the self-encoder neural network is used for extracting the characteristics of numerical data, and the natural language processing technology is used for extracting the characteristics of text data;
The key features include data integrity related features, data consistency features, data accuracy features, test process related features, and sample related features.
In the fourth step, the deep learning model is a multi-layer neural network model, and the number of input layer nodes of the deep learning model is matched with the number of features of the feature subset.
In the fifth step, the trained deep learning model is evaluated by using the independent test data set, and the accuracy, recall, F1 value and AUC value of the deep learning model are calculated as evaluation indexes in the evaluation process.
In the sixth step, after the feature subset is input into the deep learning model, the deep learning model outputs a quality evaluation result and generates a corresponding confidence score.
In the sixth step, the expert system comprises a rule base, wherein the rule base is constructed based on key quality points summarized by international clinical test standard, industry authority guidelines and historical successful clinical test cases, and the rule base is provided with a dynamic updating mechanism.
As a further preferred mode of the technical scheme, in the step eight, the clinical test quality assessment report adopts a standardized document format, and the content of the clinical test quality assessment report comprises test basic information, various assessment result details and comprehensive assessment conclusion.
In order to solve the technical problems, the application adopts another technical scheme that the clinical trial quality evaluation system based on artificial intelligence comprises a data acquisition module, a data cleaning and sorting module, a feature extraction and selection module, a model training module, a model evaluation and optimization module, a quality evaluation module, a depth evaluation module and a report generation module;
the data acquisition module is used for acquiring clinical test data from each link of a clinical test, wherein the clinical test data comprises basic information, medical history, treatment process data, inspection and test result data, test results, adverse reaction records and test drugs of a patient;
The data cleaning and sorting module is used for cleaning the collected clinical test data, removing noise data, error data and repeated data, sorting and classifying according to a preset data format, and constructing a clinical test data set;
The feature extraction and selection module is used for extracting features of the clinical test data set based on an artificial intelligence algorithm, extracting key features reflecting the quality of the clinical test, and screening out feature subsets for evaluating the quality of the clinical test through a feature selection algorithm based on mutual information and a principal component analysis algorithm;
The model training module is used for constructing a deep learning model, taking the feature subset as input data, taking a known clinical test quality evaluation result as an output label, and training the deep learning model;
the model evaluation and optimization module is used for evaluating the trained deep learning model by using the test data set and optimizing the deep learning model according to the evaluation result;
the quality evaluation module is used for inputting the feature subset of the clinical test data to be evaluated into the deep learning model to obtain a quality evaluation result;
The depth evaluation module is used for performing depth evaluation on quality problems or clinical tests by utilizing an expert system and combining manual auditing to obtain a depth evaluation and verification result;
And the report generation module is used for integrating the quality evaluation result, the depth evaluation and the verification result to generate a clinical test quality evaluation report.
The invention has the advantages that:
1. according to the invention, through a series of automatic data processing and model evaluation processes, automatic operation is basically realized from data acquisition to final report generation, a large amount of data is not required to be manually processed one by one and complex evaluation calculation is not required, and the evaluation efficiency is greatly improved;
2. According to the invention, the evaluation is performed by means of an artificial intelligent algorithm and a rule-based expert system, so that the influence of subjective factors on an evaluation result is effectively reduced, and the consistency and accuracy of the evaluation result are improved;
3. according to the invention, various types of high-dimensional data can be processed through an artificial intelligence algorithm, the key feature subsets are extracted through feature extraction and screening, and then the key feature subsets are evaluated by using a trained deep learning model, and the continuously-changed large-scale and high-dimensional data condition can be adapted through continuously optimizing the model performance, so that the requirements of efficient and accurate evaluation are effectively met.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of an artificial intelligence based clinical trial quality assessment method of the present invention;
FIG. 2 is a flow chart of a method of constructing a clinical trial data set according to the present invention;
FIG. 3 is a schematic diagram of functional blocks of an artificial intelligence based clinical trial quality assessment system of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
FIG. 1 is a flow chart of an artificial intelligence based clinical trial quality assessment method according to an embodiment of the present application. It should be noted that, if there are substantially the same results, the method of the present application is not limited to the flow sequence shown in fig. 1. 1-2, an artificial intelligence-based clinical trial quality assessment method comprises the following steps:
step one, acquiring clinical test data, wherein the clinical test data are acquired from a clinical test information system of a medical institution, a medical database and a test data storage library of a scientific research institution;
Specifically, a safe and stable data connection interface is established with a clinical test information system, a medical database and a test data storage library of related scientific institutions of each large medical institution, the interface conforms to the international data transmission standard (such as HL7, etc.), an encryption transmission protocol (such as SSL/TLS) is adopted to ensure the safety of data in the transmission process, an authorization mechanism of data acquisition is defined, and only authorized data which accords with ethical specifications is acquired. Meanwhile, a targeted data acquisition plan is set according to different stages of a clinical test (such as patient recruitment, group entry, treatment process, follow-up visit and the like), the type, range, acquisition frequency and the like of data to be acquired in each stage are determined in detail, and the integrity and timeliness of the data are ensured.
Step two, constructing a clinical test data set, cleaning the clinical test data, removing noise data, error data and repeated data, and sorting and classifying according to a preset data format;
Specifically, the cleaned and arranged data are combined into one data set, the integrity and consistency of the data are ensured, the integrated data set is subjected to secondary verification, the accuracy and reliability of the data are ensured, and descriptive labels are added for each variable in the data set so as to facilitate subsequent analysis and interpretation.
Step three, extracting features from the clinical test data set based on an artificial intelligence algorithm, extracting key features, screening out feature subsets through a feature selection algorithm and a principal component analysis algorithm of mutual information, and acquiring quality evaluation labels of the clinical test data corresponding to the feature subsets;
Specifically, a suitable machine learning or deep learning model (such as a convolutional neural network CNN, a cyclic neural network RNN, a decision tree, a random forest and the like) is selected, potential features in data can be automatically learned and extracted by the model, a feature extraction flow is designed and optimized by combining expert knowledge with data characteristics so as to ensure that the extracted features are representative and easy to explain, feature subsets which have obvious influence on clinical test quality evaluation and are independent of each other are further screened out from the extracted features, and quality evaluation labels are predetermined in previous researches or existing data sets according to corresponding clinical test data based on certain evaluation standards and methods and represent quality conditions of clinical tests in certain aspects, such as multiple dimensions of test effectiveness, safety, patient satisfaction and the like.
The method for screening the feature subsets specifically comprises the following steps:
Firstly, mutual information analysis is carried out, the relevance between each feature and the quality evaluation result of the clinical trial is measured, and the mutual information is calculated;
Then, carrying out principal component analysis, and projecting high-dimensional data into a low-dimensional space on the premise of keeping main information of the data so as to eliminate redundancy and noise;
finally, combining the results of the mutual information analysis and the principal component analysis together to determine a final feature subset, wherein the subset not only contains features highly relevant to clinical trial quality evaluation, but also ensures independence among the features, thereby improving the accuracy and generalization capability of an evaluation model.
Step four, constructing a deep learning model according to the feature subsets and quality evaluation labels of clinical test data corresponding to the feature subsets, and training the deep learning model;
specifically, firstly, the feature subset screened in the step three is used as input data of a deep learning model, a quality evaluation label of clinical test data corresponding to the feature subset is used as an output label or a target variable of the model, and the evaluation results possibly comprise multiple dimensions such as validity, safety, patient satisfaction and the like of the test;
Then, according to the complexity of the problem and the characteristics of the data, selecting a proper deep learning model, wherein common models comprise a multi-layer perceptron (MLP), a Convolutional Neural Network (CNN), a cyclic neural network (RNN), a long-short-time memory network (LSTM) and combinations or variants thereof;
then, in order to ensure stability and efficiency of model training, it is generally necessary to perform normalization or normalization processing on the input data, and if the data volume is insufficient, it is considered to use a data enhancement technique to increase the diversity of the data;
finally, selecting proper loss functions to measure the difference between the model prediction result and the actual label, wherein the common loss functions comprise Mean Square Error (MSE), cross entropy loss and the like, selecting effective optimization algorithms to update the weights of the model, such as random gradient descent (SGD), adam and the like, inputting the preprocessed data into the model, minimizing the loss functions through iterative training, thereby optimizing the parameters of the model, monitoring the performance of the model, such as accuracy, loss value and the like, in the training process, and adjusting according to the requirement.
Step five, evaluating the trained deep learning model by using a test data set, and optimizing the deep learning model according to an evaluation result;
Specifically, firstly, an independent test data set is divided from an original data set, the data set is not used in the model training process to ensure the fairness and accuracy of an evaluation result, a trained deep learning model is applied to the test data set, performance indexes of the model such as accuracy, recall rate, F1 score, an AUC-ROC curve and the like are calculated and recorded, wherein the data of the original data set are derived from a plurality of authoritative channels, including a clinical test information system of a medical institution, a medical database and a test data storage library of a scientific research institution. The data sources cover rich clinical trial related information and provide diversified data support for comprehensively and accurately evaluating the clinical trial quality. For example, a clinical trial information system of a medical institution records detailed patient group-entering conditions, various operations and observation results in the treatment process, a medical database possibly containing a large number of patient histories, examination and examination results and other data accumulated for a long time, a trial data storage library of a scientific research institution stores professional data collected for a specific study subject;
Then, comparing the performance index of the model on the test data set with preset requirements, identifying the aspects of the model, carrying out in-depth analysis on samples of the model prediction errors, knowing the error types (such as misclassification, misclassification and the like) and possible reasons, and analyzing which features have the greatest influence on the model prediction results and whether the model is too dependent on certain features by utilizing feature importance assessment tools (such as SHAP, LIME and the like);
Then, aiming at the problem of insufficient data or unbalanced data, adopting a data enhancement technology or redesigning a data preprocessing flow, adjusting the framework of a model according to the results of error analysis and feature importance analysis, such as increasing or reducing hidden layers, changing the number of neurons, introducing new layers or modules and the like, finely adjusting the super parameters of the model by utilizing methods such as grid search, random search or Bayesian optimization and the like, introducing regularization technology (such as L1 and L2 regularization), droout layers, early stop method and the like so as to prevent the model from being overfitted in the training process;
and finally, retraining the optimized model, evaluating the performance of the model on a test data set, continuously monitoring the performance index of the model and the change of loss values on a training set and a verification set in the training process to ensure the stable improvement of the performance of the model, stopping iterative optimization when the performance index of the model on the test data set reaches a preset requirement or the performance improvement is no longer obvious, finally evaluating the model by using a final test data set after stopping iterative optimization to ensure the stability and the reliability of the performance of the model, and recording key information such as development process, architecture, super-parameter setting, performance index and the like of the model for subsequent use and reproduction.
Step six, acquiring a feature subset of clinical test data to be evaluated, and inputting the feature subset into a deep learning model to obtain a quality evaluation result;
Specifically, first, clinical trial data to be evaluated are collected, which data should cover all critical aspects of the trial, such as patient information, treatment regimen, trial results, etc.;
then, according to the method in the second step, the collected data is cleaned, abnormal values are removed, format errors are corrected, and repeated data are deleted;
Then, following the feature extraction method in step three, extracting key features from the cleaned data that can reflect the quality of the clinical trial, which may include baseline features of the patient, characteristics of the treatment regimen, key events during the trial, etc.;
then, utilizing a feature selection algorithm (such as mutual information combined with principal component analysis) in the third step to screen out feature subsets which have important influence on clinical test quality evaluation and are mutually independent from each other from the extracted features;
Finally, loading the deep learning model trained and optimized in the step five, preprocessing the selected feature subset according to the requirement of model input, such as standardization or normalization, inputting the preprocessed feature subset into the deep learning model, carrying out quality assessment on clinical test data to be assessed based on the knowledge learned by the model, and outputting an assessment result;
Because of the challenges of evaluating raw clinical trial data directly, such as the high dimensionality and complexity of the data, and the large amount of redundant information that may be contained in the data, etc., many challenges are presented. The feature subset is obtained through feature extraction and screening, key information in data can be concentrated, data dimension is reduced, noise and redundancy are removed, and a deep learning model can learn patterns and relations related to clinical test quality more effectively, so that evaluation accuracy and evaluation efficiency are improved. The model evaluates the feature subset of the new data to be evaluated based on the feature mode learned after processing a large amount of historical clinical test data, so that the model can be better suitable for complex and changeable clinical test data conditions;
The expert system is utilized and is combined with manual auditing to carry out deep evaluation on the quality evaluation result, so as to obtain a deep evaluation and verification result;
Specifically, firstly, a set of comprehensive rule base is constructed based on professional knowledge and experience in the clinical test field, the rules cover multiple aspects of rationality of clinical test design, data collection integrity, statistical analysis accuracy and the like, clinical test data which are preliminarily estimated by a deep learning model and possibly have quality problems or are in a critical state are matched with the rule base, and a test needing further deep estimation is screened out;
Then, an audit team consisting of clinical trial experts, statistics scientists, data scientists and the like is built to manually audit the screened trials, and the audit team will conduct detailed audit and analysis for specific conditions of each trial, which may include re-review of trial design, re-check of data quality, check of statistical analysis process and the like, and during the deep evaluation process, the audit team will identify and record possible problems or defects in the trial.
Step seven, integrating the quality evaluation result, the depth evaluation and the verification result to generate a clinical test quality evaluation report;
specifically, firstly, summarizing a preliminary evaluation result of a deep learning model, a test list which is screened by a rule-based expert system and may have problems or be in a critical state, and results of manual auditing and deep evaluation, so as to ensure that all integrated data are accurate, complete and consistent, and the data may need to be cleaned and checked again to avoid errors or omission in the integration process;
And then, designing a framework of a clinical test quality evaluation report, including introduction, an evaluation method, an evaluation result, problem analysis, suggestion, conclusion and the like, and arranging the integrated data according to the report framework to ensure clear content and strict logic.
In one embodiment, in particular, in step one, the clinical trial data includes patient baseline information, medical history, treatment course data, examination test result data, trial results, adverse reaction records, and trial medication;
the basic information of the patient comprises the name (or anonymous identifier), age, sex, height, weight, contact mode (applicable), past medical history (non-specific medical history of the test) and the like of the patient;
Medical history, such as past illness, family medical history, allergy history, etc., specific to the medical history related to the test;
The treatment process data comprises the time of starting and ending treatment, treatment frequency, treatment dosage and the like, and also comprises the compliance condition of patients in the treatment process, such as whether to take medicine on time, whether to follow medical advice and the like;
Examination test result data, including the results of various examinations (e.g., blood examinations, imaging examinations, etc.) performed by the patient during the test, which are typically used to assess the health condition, therapeutic effect, and the presence or absence of adverse reactions of the patient;
Test results, including primary and secondary endpoint data of the test, for assessing the effectiveness of a test drug or treatment regimen, which may include patient survival, disease progression, degree of symptom improvement, etc.;
Test drugs, including the name, specification, manufacturer, lot number, etc. of the test drugs.
In one embodiment, in particular, in step two, a method of constructing a clinical trial data set comprises the steps of:
Step 1, removing abnormal values exceeding a normal physiological range or a preset test data range by utilizing a data range checking rule, specifically, firstly setting a reasonable data range for key variables according to a physiological common sense, a clinical test design and a preliminary research result, then performing range checking on the key variables of each record by using a programming tool (such as Python, R and the like) or a database management system, and finally, selecting to delete or mark the record exceeding the data range as the abnormal value, wherein the abnormal value depends on a research purpose and a data analysis plan.
Step 2, screening out data which does not meet the specified format through data format verification and correcting or marking, specifically, defining a specific data format such as a date format, a numerical format, a text format and the like for each variable according to research design and data collection requirements, then checking whether the data format of each record meets the requirements or not by using a programming tool or a database management system, and finally, selecting correction (such as converting the date of the text format into a standard date format) or marking as a format error for subsequent processing for the records which do not meet the data format.
Step 3, identifying and deleting the completely repeated data items by utilizing a data repeatability detection algorithm, specifically, firstly selecting a proper repeatability detection algorithm such as a hash algorithm, similarity calculation and the like according to the characteristics of the data and the research requirements, then carrying out repeatability detection on a data set by using a programming tool or a database management system, and finally, selecting and reserving one part of effective data for the detected repeated data items and deleting the rest repeated data.
The method comprises the steps of (1) carrying out classified storage on clinical test data after cleaning according to patient numbers, test item numbers and data types to construct a clinical test data set with clear completion level and easy management, specifically, firstly, determining classification standards of the data, such as the patient numbers, the test item numbers and the data types, according to study design and data analysis requirements, then, using a programming tool or a database management system to classify the data according to the classification standards, and finally, storing the classified data in a proper storage medium (such as a database, a file system and the like), and ensuring the accessibility and the safety of the data.
In a third step, the artificial intelligence algorithm comprises a convolutional neural network, a self-encoder neural network and a natural language processing technology, wherein the convolutional neural network is used for extracting characteristics of image data in a clinical test data set, wherein the image data possibly comprise medical images (such as X-ray films, CT scanning and the like) of a patient, image records of test equipment and the like;
The self-encoder neural network is used for extracting the characteristics of the numerical data, wherein the numerical data possibly comprises physiological indexes (such as blood pressure, heart rate and the like) of a patient, laboratory examination results and the like;
The natural language processing technology is used for extracting characteristics of the text data, wherein the text data may comprise medical record of a patient, diagnosis report of a doctor, notes in the test process and the like;
the key features comprise data integrity related features, data consistency features, data accuracy features, test process related features and sample related features;
Data integrity related features, including data integrity, proportion of missing values, etc., for evaluating whether the data covers all aspects of the test;
a data consistency feature, comprising consistency of data between different points in time and different sources, for detecting potential errors or inconsistencies in the data;
The data accuracy characteristics comprise accuracy, precision and reliability of the data and are used for evaluating the authenticity and credibility of the data;
the relevant characteristics of the test process, including rationality of the test design, compliance of the test process, compliance of patients, etc., are used for evaluating the execution quality and reliability of the test;
Sample related characteristics, including selection criteria of the sample, representativeness of the sample, size of the sample, etc., are used to evaluate whether the sample is sufficient to support the reliability of the test results.
In one embodiment, in step four, the deep learning model is a multi-layer neural network model, and the number of input layer nodes of the deep learning model is matched with the number of features of the feature subset;
The hidden layers of the deep learning model comprise at least two layers, wherein the number of neurons of the first hidden layer is 1.5-2 times of the number of features of the feature subset, the number of neurons of the subsequent hidden layers is sequentially decreased, and the hidden layers enhance the nonlinear expression capacity of the model by utilizing a ReLU activation function;
the number of the nodes of the output layer of the deep learning model is consistent with the number of categories of the quality evaluation result of the clinical trial, and the Softmax function is utilized for outputting the classification probability so as to realize the effective prediction of the multi-category evaluation result;
When training the deep learning model, the small batch gradient descent algorithm is utilized, the number of small batch samples selected each time is 16-64, the initial learning rate is set to be 0.0005-0.01, and the learning rate is gradually lowered along with the increase of training round number by utilizing the learning rate attenuation strategy, so that the model is prevented from being overfitted;
In the process of training the deep learning model, an early stopping method and a cross verification method are utilized to prevent the model from being overfitted and improve the stability and generalization capability of the model;
when training the deep learning model, an early stop method is applied to prevent the model from being over fitted and determine the optimal training round, and the specific operation is as follows:
The training data set is divided into a training set and a verification set, and is generally divided according to a certain proportion (such as 80% training set and 20% verification set);
During the training process, each time a training round (epoch) is completed, the performance of the model is evaluated on the verification set, and the performance index may select an accuracy rate, a recall rate, an F1 value or other suitable evaluation index, for example, if the accuracy rate is selected as the performance index, the calculation model predicts the proportion of the correct number of samples on the verification set to the total number of samples of the verification set;
Setting a patience value (patience) to indicate that training is stopped when the performance of the model on the validation set is no longer improved for consecutive training runs, for example, when the patience value is set to 10, if the accuracy of the model on the validation set is not improved for consecutive 10 training runs;
Meanwhile, training rounds and corresponding model parameters of the model with the best performance on the verification set are recorded, and once early-stop conditions are met, the model parameters are restored to the model parameter state with the best performance and are used as a final trained model, so that the phenomenon that the model is over-fitted after over-training is avoided, and generalization capacity on new data is reduced.
When training the deep learning model, the stability and generalization capability of the model are further improved by adopting a cross validation method, and the specific steps are as follows:
Dividing the whole training data set into k folds (folds) with similar sizes, wherein the value of k is usually 3-10, for example, k=5 is selected;
For each cross-validation iteration:
Selecting one of the folds as a verification set and the remaining k-1 folds as training sets;
Training the deep learning model by using the selected training set, wherein the optimal model parameters of each folding training can be determined by combining the early-stop method in the training process;
Evaluating the performance of the current training model on the verification set, and recording an evaluation result;
Repeating the steps until each fold has the opportunity to serve as a verification set;
Finally, the evaluation results of k times of cross validation are synthesized, such as average accuracy rate and average F1 value, according to the comprehensive evaluation results, the performance of the model can be more comprehensively known, and the super parameters (such as the layer number of the neural network, the number of neurons, the learning rate and the like) of the model can be adjusted and optimized to obtain the optimal model configuration.
In a specific embodiment, in the fifth step, the trained deep learning model is evaluated by using an independent test data set, the accuracy, recall, F1 value and AUC value of the deep learning model are calculated as evaluation indexes in the evaluation process, wherein the accuracy is the ratio of the number of samples correctly predicted by the model to the total number of test samples, the overall performance of the model on the whole test data set is measured, the recall is the ratio of the number of true positive samples to the sum of the number of true positive samples and the number of false negative samples, the capability of the model in identifying positive examples (such as patients with diseases) is measured, namely, the ratio of positive examples which can be correctly identified by the model is calculated, the product of the accuracy and the recall is divided by the sum of the accuracy and the recall by 2 times, the capability of the model in identifying positive examples and negative examples is comprehensively measured, the AUC value is obtained by calculating the area under the operational characteristic curve of a receiver, the overall performance of the model under different threshold values is measured, and particularly, the AUC value is very useful in processing unbalanced data sets, and the better performance is expressed by the model with the more than 1.
In a specific embodiment, in the sixth step, after inputting the feature subset of the clinical test data to be evaluated into the deep learning model, the deep learning model outputs a quality evaluation result, and generates a corresponding confidence coefficient score, where the confidence coefficient score indicates the reliability of the deep learning model on the evaluation result, the value range is 0-1, and if the confidence coefficient score is lower than a preset threshold, a secondary auditing mechanism for the evaluation result is automatically triggered, where the confidence coefficient score is a quantization index for the reliability of the prediction result (i.e. the evaluation result) of the deep learning model, and the purpose of the secondary auditing mechanism is to improve the accuracy and reliability of the evaluation result through an additional auditing step when the confidence coefficient output by the model is lower, which is helpful to reduce false alarm and missing report, and improve the performance of the whole system.
In a specific embodiment, in the sixth step, the expert system includes a rule base, and the rule base is constructed based on the international clinical test specification standard, the industry authority guideline, and key quality points summarized by the history successful clinical test cases, and has a dynamic update mechanism;
The specific method for constructing the rule base is as follows:
And (3) collecting and arranging standard standards, namely comprehensively collecting various internationally-universal clinical test standard standards, such as ICH-GCP (international conference on medicine registration technical requirements for human use-quality management standard of clinical tests of medicines) and the like. The organisational professional team is looking in depth at each term of these normative criteria, extracting key requirements and guidelines directly related to the quality assessment of clinical trials. For example, detailed specifications on clinical trial plan design, subject protection, data management, statistical analysis, etc. are consolidated from ICH-GCP, translating them into quantifiable, executable rule entries;
Rule classification and refinement, namely classifying the extracted rules according to different stages and key links of clinical tests. For example, the rules are classified into pre-test preparation stage rules (involving ethics committee approval, test protocol registration, etc.), test execution stage rules (including subject recruitment, inclusion and exclusion criteria execution, drug management, adverse event monitoring, etc.), post-test stage rules (e.g., data cleaning, statistical analysis report writing, test result issuing, etc.). Specific checkpoints and judgment criteria are further refined for each class of rules. Taking a subject recruitment rule as an example, specific requirements such as validity of a recruitment channel, accuracy and completeness of recruitment information, standardization of a subject enrollment process and the like are clearly defined, and corresponding judgment thresholds or conditions are set;
Guidelines screening and interpretation screening industry guidelines with broad impact and authority in the field of clinical trials, such as related guidelines issued by the U.S. Food and Drug Administration (FDA), european Medicines Administration (EMA), etc. The organisation expert team interprets these guidelines in depth, focusing on the specific quality requirements and advice in the guidelines for a particular disease area, method of treatment or type of trial. For example, for a tumor clinical trial, reference is made to guidelines of FDA and EMA for a tumor drug clinical trial, wherein specific rules and points are extracted regarding efficacy evaluation index selection, patient stratification criteria, follow-up planning, etc.;
rule integration and optimization, namely integrating the rule extracted from the industry authority guideline with the rule constructed based on the international standard specification, avoiding repetition and ensuring the integrity and consistency of the rule. In the integration process, partial rules are optimized and adjusted according to actual application scenes and industry latest trends. For example, if an industry guide provides a new data collection method or quality control means, and the data collection method or quality control means has higher effectiveness through practice verification, the data collection method or quality control means is incorporated into a rule base and corresponding implementation rules are formulated, and other rules related to the rule base are updated at the same time, so that the cooperativity of the whole rule system is ensured;
And (3) collecting and analyzing cases, namely establishing a historical successful clinical test case library, and widely collecting clinical test cases which are in different disease fields, different treatment modes and have high-quality results through strict verification. The organization conducts in-depth analysis on these cases by interdisciplinary teams (including clinicians, collectiues, data administrators, etc.), summarizing successful experience and key quality points from multiple dimensions of trial design, data collection and management, quality control measures, outcome analysis and interpretation, etc. For example, analyzing a successful cardiovascular clinical test case, summarizing its effective practice and quality assurance mechanisms in terms of patient enrollment criteria optimization, multi-center data coordination consistency assurance, long-term follow-up data integrity maintenance, etc.;
The key points are converted into rules, namely key quality key points summarized from historical successful cases are converted into specific rule entries and are incorporated into a rule base. In the conversion process, the application range and the conditions of the rules are defined, so that the method has universality and operability. For example, if a special data auditing procedure is adopted in a certain case, the accuracy of the data is ensured, the data is extracted into a data auditing rule, and auditing methods, frequencies, responsibilities and the like which should be adopted under similar test conditions are specified. At the same time, a case reference basis is attached to each rule for subsequent query and interpretation.
If the relevant specifications and guidelines are updated or new typical cases appear, the regulation and the supplementation of the rules are completed within a preset time, specifically, when the evaluation result (such as a certain stage of clinical trial or overall quality score) output by the deep learning model is input into the rule-based expert system, the system further verifies and regulates the evaluation result according to the rules in the rule base, and if the evaluation result is inconsistent with the rules in the rule base, the system may trigger a warning or advice for further examination.
In a specific embodiment, in the step seven, the clinical test quality assessment report adopts a standardized document format, the content of the clinical test quality assessment report comprises test basic information, various assessment result details, comprehensive assessment conclusions and targeted improvement suggestions, and various data related in the assessment process are systematically integrated and arranged by generating the clinical test quality assessment report in the standardized document format so as to conform to the reading habit of a report frame and a reader, so that a comprehensive, objective and reliable assessment result display platform can be provided for interested parties such as researchers, regulatory authorities and the like, and the report is not only helpful for revealing possible problems and risk points in the test, but also provides powerful support and guidance for improving the test quality.
FIG. 3 is a schematic diagram of functional modules of an artificial intelligence-based clinical trial quality assessment system according to an embodiment of the present application, as shown in FIG. 3, the artificial intelligence-based clinical trial quality assessment system includes a data acquisition module, a data cleaning and sorting module, a feature extraction and selection module, a model training module, a model assessment and optimization module, a quality assessment module, a depth assessment module, and a report generation module;
the data acquisition module is used for acquiring clinical test data from each link of a clinical test, wherein the clinical test data comprises basic information, medical history, treatment process data, inspection and test result data, test results, adverse reaction records and test drugs of a patient;
The data cleaning and sorting module is used for cleaning the collected clinical test data, removing noise data, error data and repeated data, sorting and classifying according to a preset data format, and constructing a clinical test data set;
The characteristic extraction and selection module is used for extracting characteristics of the clinical test data set based on an artificial intelligence algorithm, extracting key characteristics reflecting the quality of the clinical test, and screening out a characteristic subset for evaluating the quality of the clinical test through a characteristic selection algorithm based on mutual information and a principal component analysis algorithm, wherein the artificial intelligence algorithm comprises a convolutional neural network, a self-encoder neural network and a natural language processing technology;
The model training module is used for constructing a deep learning model, taking the feature subset as input data, taking a known clinical test quality evaluation result as an output label, and training the deep learning model;
the model evaluation and optimization module is used for evaluating the trained deep learning model by using an independent test data set and optimizing the deep learning model according to an evaluation result;
the quality evaluation module is used for inputting the feature subset of the clinical test data to be evaluated into the deep learning model to obtain a quality evaluation result;
the depth evaluation module is used for carrying out depth evaluation on the clinical test with quality problems or in a critical state by utilizing the expert system based on rules and combining manual auditing to obtain a depth evaluation and verification result;
And the report generation module is used for integrating the quality evaluation result, the depth evaluation and the verification result to generate a clinical test quality evaluation report.
According to the clinical test quality evaluation system based on the artificial intelligence, through the cooperative work of the modules, the clinical test quality can be evaluated comprehensively, efficiently and accurately, and a powerful guarantee is provided for the successful development and quality improvement of the clinical test.
For further details of the implementation of the technical scheme by each module in the system of the above embodiment, reference may be made to a description in the method and system for evaluating quality of clinical trials based on artificial intelligence in the above embodiment, which is not repeated here.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For system-like embodiments, the description is relatively simple as it is substantially similar to method embodiments, and reference should be made to the description of method embodiments for relevant points.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

CN202510062732.1A2025-01-152025-01-15 A clinical trial quality assessment method and system based on artificial intelligencePendingCN119479958A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510062732.1ACN119479958A (en)2025-01-152025-01-15 A clinical trial quality assessment method and system based on artificial intelligence

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510062732.1ACN119479958A (en)2025-01-152025-01-15 A clinical trial quality assessment method and system based on artificial intelligence

Publications (1)

Publication NumberPublication Date
CN119479958Atrue CN119479958A (en)2025-02-18

Family

ID=94570170

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510062732.1APendingCN119479958A (en)2025-01-152025-01-15 A clinical trial quality assessment method and system based on artificial intelligence

Country Status (1)

CountryLink
CN (1)CN119479958A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120052899A (en)*2025-04-252025-05-30浙江清华长三角研究院Psychological pressure detection method based on intelligent bed core impact signals

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108133300A (en)*2016-11-302018-06-08北京经纬传奇医药科技有限公司A kind of clinical test quality evaluation system and its method
CN108447534A (en)*2018-05-182018-08-24灵玖中科软件(北京)有限公司A kind of electronic health record data quality management method based on NLP
CN113159502A (en)*2020-06-232021-07-23上海用正医药科技有限公司Method for assessing risk of clinical trials
CN116864050A (en)*2023-05-262023-10-10中国人民解放军总医院Clinical trial quality control method and equipment for scheme deviation semi-quantitative evaluation
CN117079834A (en)*2023-08-142023-11-17上海诊瑞医疗科技有限公司Method for monitoring multi-center clinical evaluation execution deviation by using large language model
CN119003999A (en)*2024-07-252024-11-22浪潮云信息技术股份公司Medical data cleaning method, system, equipment and medium based on machine learning optimization
CN119028601A (en)*2024-08-142024-11-26中国人民解放军总医院 A clinical trial protocol deviation identification method and management system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108133300A (en)*2016-11-302018-06-08北京经纬传奇医药科技有限公司A kind of clinical test quality evaluation system and its method
CN108447534A (en)*2018-05-182018-08-24灵玖中科软件(北京)有限公司A kind of electronic health record data quality management method based on NLP
CN113159502A (en)*2020-06-232021-07-23上海用正医药科技有限公司Method for assessing risk of clinical trials
CN116864050A (en)*2023-05-262023-10-10中国人民解放军总医院Clinical trial quality control method and equipment for scheme deviation semi-quantitative evaluation
CN117079834A (en)*2023-08-142023-11-17上海诊瑞医疗科技有限公司Method for monitoring multi-center clinical evaluation execution deviation by using large language model
CN119003999A (en)*2024-07-252024-11-22浪潮云信息技术股份公司Medical data cleaning method, system, equipment and medium based on machine learning optimization
CN119028601A (en)*2024-08-142024-11-26中国人民解放军总医院 A clinical trial protocol deviation identification method and management system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DSP04207: "临床试验数据质量评估的标准与方法介绍", pages 1 - 3, Retrieved from the Internet <URL:https://www.docin.com/p-4773811898.html>*
郭萱 等: "智能化临床试验数据整合与质量控制平台的研发与应用效果评估", 中国新药与临床杂志, no. 03, 25 March 2020 (2020-03-25), pages 47 - 52*

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120052899A (en)*2025-04-252025-05-30浙江清华长三角研究院Psychological pressure detection method based on intelligent bed core impact signals

Similar Documents

PublicationPublication DateTitle
CN118280570A (en)Disease prediction and risk assessment method based on medical large model
Bhatnagar et al.An Efficient Techniques For Disease Prediction From Medical Data Using Data Mining And Machine Learning
CN119377894A (en) Multimodal data dynamic fusion and annotation method and system for medical decision-making
JP2024061599A (en)Disease medical care process abnormality identification system based on hierarchy chart neural network
CN115714022B (en)Neonatal jaundice health management system based on artificial intelligence
CN113657548A (en) Medical insurance abnormality detection method, device, computer equipment and storage medium
CN118800459B (en)Method and device for evaluating health of slow patient group
CN119479958A (en) A clinical trial quality assessment method and system based on artificial intelligence
CN116844733B (en)Medical data integrity analysis method based on artificial intelligence
CN118645200A (en) A method for spinal etiology analysis and risk prediction based on artificial intelligence modeling
CN108595432B (en)Medical document error correction method
Rahim et al.Machine learning based decision support system for determining the priority of covid–19 patients
CN120032792A (en) A method and system for intelligent diagnosis and treatment management of chronic respiratory diseases based on a large model
CN116597981A (en) A method for the prevention and management of chronic obstructive pulmonary disease combined with digital twin technology
CN115019958B (en) A method and device for detecting abnormal cases
CN113160986A (en)Model construction method and system for predicting development of systemic inflammatory response syndrome
CN120299680A (en) A medical auxiliary diagnosis method and system based on convolutional neural network
CN119252507B (en) Training method of intelligent medical decision-making model based on pediatric clinical practice
CN120148720A (en) Rapid archiving method and system based on semantic extraction
CN119446430A (en) A method and system for TCM diagnosis and treatment decision-making based on evidence
CN119274767A (en) A method for constructing a vaccine dynamic evaluation and early warning model based on multi-source data
CN118762852A (en) A method for evaluating and warning the safe use of tigecycline
CN114420300B (en)Chinese senile cognitive impairment prediction model
CN119851962B (en) Analysis methods, systems, equipment and media for nuclear medicine imaging radiology reports
CN120236782B (en) A digital tumor prevention and treatment management platform and management method

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp