2024年4月13日发(作者:)
摘 要
摘 要
蛋白质被认为是生命中的基本要素,具有维持生命的各种功能,这也使得蛋
白质组学成为现代生物信息学中非常重要的研究领域。由于蛋白质根据其功能可
以分为不同的类别,而同一类别的蛋白质具有相似的结构,同时,它们也具有相
似的性质,因此,研究蛋白质的分类对确定其功能有着重要的意义。而随着生物
技术的发展,大量的蛋白质被发现,其中只有少部分通过实验分析确定了它们的
结构以及相应的生物功能,对于快速增长的蛋白质数据,采用实验的方法分析需
要耗费大量的人力和时间。因此,通过计算技术对蛋白质进行分类并研究其功能
以更好地理解生命周期背后的理论变得越来越重要。
如今,机器学习和神经网络技术被广泛地应用于生物信息学问题当中,其利
用学习的方式从大量的数据中提取知识,进而分析其背后的规律。而在许多问题
中,这些数据可以由图、网络、树或序列的离散结构自然地表示。本文以蛋白质
为研究对象,将蛋白质转化为图结构模型,通过提出的VES(Vertex Edge Similarity)
图核函数对蛋白质图结构提取特征,并结合DNN(Deep Neural Networks)构建出
对蛋白质分类的VES-DNN模型。通过实验结果验证,VES-DNN模型的分类效
果优于其他图核。另外,在此基础上,本文利用多核进行集成学习,提出
MultiKernel-Stacking(Multiple Kernel Stacking)蛋白质分类模型,并通过实验结果
可以得到,该分类模型优于VES-DNN模型。
本文主要的研究内容如下:
1. 提出了VES图核函数。首先,将图的赋权邻接矩阵中的每一行作为对应
顶点的向量,通过比较两图中顶点向量的相似性来度量两图的相似性,并根据两
图顶点的最大相似度来确定核值。
2. 提出了基于VES图核函数的VES-DNN蛋白质分类模型。根据VES图
核函数得到关于蛋白质图结构样本的核矩阵,将核矩阵中的每一行作为神经网络
的输入特征向量,得到分类结果。通过实验结果表明该模型可以有效的提高蛋白
质的分类效果。
3. 提出了MultiKernel-Stacking蛋白质分类模型。该模型通过Stacking集成
学习的方法,将多个图核函数的VES-DNN模型分类结果组成的向量作为神经网
络的输入,得到MultiKernel-Stacking模型的分类结果。通过实验结果分析并与
VES-DNN模型比较,该模型进一步提高了蛋白质的分类效果。
关键词:蛋白质分类;图核;神经网络;集成学习
-I-
Abstract
Abstract
Protein is considered to be an essential element in life and has various functions
to sustain life, which makes proteomics a very important research field in modern
bioinformatics. Since proteins can be classified into different categories according to
their functions, and proteins of the same class have similar structures, and they also
have similar properties, it is important to study the classification of proteins to
determine their functions. With the development of biotechnology, a large number of
proteins have been discovered, and only a small part of them have been
experimentally analyzed to determine their structure and corresponding biological
functions. For rapid growth of protein data, the experimental method requires a lot of
labor and time. Therefore, it is becoming more and more important to classify
proteins and study their functions through computational techniques to better
understand the theory behind the life cycle.
Today, machine learning and neural network technology are widely used in
bioinformatics problems, which use learning methods to extract knowledge from a
large amount of data, and then analyze the laws behind it. And in many problems,
these data can be naturally represented by discrete structures of graphs, networks,
trees, or sequences. In this paper, we use protein as the research object, transform the
protein into a graph structure model, extract the features of the protein graph structure
by the proposed VES (Vertex Edge Similarity) graph kernel function, and combine
DNN (Deep Neural Networks) to construct a VES-DNN protein classification model.
The experimental results show that the classification effect of the VES-DNN model is
better than other graph kernels. In addition, based on this, this paper uses multi-kernel
for ensemble learning, and proposes MultiKernel-Stacking (Multiple Kernel Stacking)
protein classification model. It can be obtained from the experimental results that the
classification model is superior to the VES-DNN model.
The main research contents of this paper are as follows:
1. Proposed VES graph kernel function. First, each row in the weighted
adjacency matrix of the graph is used as the vector of the corresponding vertex, the
similarity of the two graphs is measured by comparing the similarity of the vertex
vectors in the two graphs, and the kernel values are determined according to the
maximum similarity of the vertices of the two graphs.
-III-
发布者:admin,转转请注明出处:http://www.yc00.com/web/1712953544a2154708.html
评论列表(0条)