Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

© 2014, Springer Science+Business Media New York. The increasing amounts of high-throughput biological datasets stimulate the information engineering and machine learning research community to direct more studies towards designing and applying novel methods which are sophisticated and specialised to tackle the problems that are specific in such datasets. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method tackles the problem of scrutinising multiple gene expression microarray datasets to identify the subsets of genes which are consistently co-expressed across them. It allows for clustering results which better reflect the biological fact that most of the genes in any cell are expected to be irrelevant to the specific context in hand, as well as the fact that many genes might participate in multiple processes. This has been achieved by clustering the given set of genes while allowing any gene to have any of the three eventualities, to be exclusively assigned to a single cluster, to be simultaneously assigned to multiple clusters, or not to be assigned to any of the clusters. In this study, we expand the scope of application of the Bi-CoPaM method by applying it, for the first time, to bacterial datasets, namely to a set of five Escherichia coli bacterial datasets generated under different biological conditions, in order to identify the subsets of genes which are consistently co-expressed, i.e. well correlated with each other. We identify two clusters with such consistent co-expression, and interestingly, they themselves are consistently negatively correlated with each other. The first cluster is enriched with genes participating in protein synthesis and DNA repair while the second is enriched with transporting genes. Consequently, we draw biological hypotheses that relate some of the genes with currently unknown biological processes to their potential processes. These hypotheses can serve as pilots for focused future gene discovery studies.

Original publication

DOI

10.1007/s11265-014-0919-7

Type

Journal article

Journal

Journal of Signal Processing Systems

Publication Date

01/01/2015

Volume

79

Pages

159 - 166