A Combined Data Mining Based-Bi Clustering and Order Preserved Sub-Matrices Algorithm for Set Covering Problem

Document Type : Research Paper


1 Department of Industrial Engineering, Faculty of Engineering, Semnan University, Semnan, Iran

2 Department of industrial engineering, Semnan university, semnan, Iran

3 Department of Industrial Engineering, Semnan University, Semnan, Iran.



This study evaluates a Set Covering Problem (SCP), an extension of the demand covering problem, with several potential applications. The original demand covering problem objective includes the selection of proper locations for a number of available facilities to cover the required demand. The SCP tries to minimize location cost satisfying a specified level of coverage. The SCP problems answer many location problems, e.g., the emergency services sector with alternative facilities that will cover the unavailability of the primary facility or recommender systems where it is desired to fulfill the demand by several available choices. We present a biclustering method to construct biclusters from the distance matrix where a bicluster depicts a subset of demand centers covered by a subset of facilities. According to experiments performed in this study, it is concluded that the proposed method provides high-quality solutions compared with an optimal solution attained from GAMS. Also, for larger problem instances, the proposed method provided solutions with higher quality than GAMS software when the computational time is limited to 1 Hour.


Ayadi, W., Elloumi, M. and Hao, J. K. Bio Data mining A biclustering algorithm based on a bicluster enumeration tree: application to dna microarray data. 2009. 2(1): p. 9.
Beasley, J.E. J.J.o.G.O., Obtaining test problems via internet. 1996. 8(4): p. 429-433.
Ben-Dor, A., Chor, B., Karp, R., & Yakhini, Z. Discovering local structure in gene expression data: the order-preserving submatrix problem. 2002. 10(3-4): p. 373-384.
Bergmann, S., Ihmels, J. and Barkai, N. J.P.r.E. Iterative signature algorithm for the analysis of large-scale gene expression data. 2003. 67(3): p. 031902.
Boutsinas, B. J.I.J.o.A.I.T., A new biclustering algorithm based on association rule mining. 2013. 22(03): p. 1350017.
Boutsinas, B. and Papastergiou, T. J.P.R. On clustering tree structured data with categorical nature. 2008. 41(12): p. 3613-3623.
Brotcorne, L., Laporte, G. and Semet, F. Fast heuristics for large scale covering-location Problems, Computers and Operations Research 2002. 29(6): 651-665.
Buditjahjanto, I.A. and Miyauchi, H. J.I.J.o.I.T. D. Making, An intelligent decision support based on a subtractive clustering and fuzzy inference system for multiobjective optimization problem in serious game. 2011. 10(05): p. 793-810.
Busygin, S., Prokopyev, O., & Pardalos, P. M. Biclustering in data mining. Computers & Operations Research. 2008: 35(9), 2964-2987.
Caprara, A., Toth, P. and Fischetti, M. J.A.o.O.R. Algorithms for the set covering problem. 2000. 98(1-4): p. 353-371.
Cheng, Y. and Church G.M. Biclustering of expression data. in Ismb. 2000.
Crémilleux, B., Soulet, A., Kléma, J., Hébert, C., & Gandrillon, O.  Discovering knowledge from local patterns in sage data. In Data Mining and Medical Knowledge Management: Cases and Applications. 2009: (pp. 251-267). IGI Global.‏.
Daskin, M.J.N.Y., Network and Discrete Location Wiley. 1995.
Eiselt, H. and Marianov, V. J.S.-E.P.S. Gradual location set covering with service quality. 2009. 43(2): p. 121-130.
Farahani, R. Z., Asgari, N., Heidari, N., Hosseininia, M., & Goh, M. Covering problems in facility location: A review. Computers & Industrial Engineering. 2012: 62(1), 368-407.
Fayyad, U.M., Piatetsky-Shapiro, G. and Smyth., P. Knowledge Discovery and Data Mining: Towards a Unifying Framework. in KDD. 1996.
Gao, B. J., Griffith, O. L., Ester, M., Xiong, H., Zhao, Q., & Jones, S. J.  On the deep order-preserving submatrix problem: A best effort approach. IEEE transactions on knowledge and data engineering. 2010: 24(2), 309-325.‏.
Garey, M.R. and Johnson, D. S. J.A.g.t.N.-c., Computer and intractability. 1979.
Glover, F.W. and Kochenberger, G. J.I.J.o.I.T. D. Making, New optimization models for data mining. 2006. 5(04): p. 605-609.
Grossman, T. and Wool, A. J.E.J.o.O.R. Computational experience with approximation algorithms for the set covering problem. 1997. 101(1): p. 81-92.
Hartigan, J.A. J.J.o.t.a.s.a., Direct clustering of a data matrix. 1972. 67(337): p. 123-129.
Karasakal, O. and Karasakal, E.K. J.C.O. Research, A maximal covering location model in the presence of partial coverage. 2004. 31(9): p. 1515-1526.
Kinney, G.W., Barnes, J.W. and Colletti, B.W. J.I.J.o.O.R. A reactive tabu search algorithm with variable clustering for the unicost set covering problem. 2007. 2(2): p. 156-172.
Kochenberger, G., Glover, F., & Alidaee, B. An effective approach for solving the binary assignment problem with side constraints. International Journal of Information Technology & Decision Making. 2002: 1(01), 121-129.
Lazzeroni, L. and Owen, A. J.S.s. Plaid models for gene expression data. 2002: p. 61-86.
Madeira, S.C. and Oliveira, A.L. J.I.A.T.o.C.B. Bioinformatics, Biclustering algorithms for biological data analysis: a survey. 2004. 1(1): p. 24-45.
Murali, T. and Kasif, S. Extracting conserved gene expression motifs from gene expression data, in Biocomputing 2003. 2002, World Scientific. p. 77-88.
Peeters, R. J.D.A.M., The maximum edge biclique problem is NP-complete. 2003. 131(3): p. 651-654.
Pirkul, H. and Schilling, D. A. J.M.S. The maximal covering location problem with capacities on total workload. 1991. 37(2): p. 233-248.
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., ... & Zitzler, E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics. 2006: 22(9), 1122-1129.‏.
Shi, Y., Peng, Y., Xu, W., & Tang, X. Data mining via multiple criteria linear programming: applications in credit card portfolio management. International Journal of Information Technology & Decision Making. 2002: 1(01), 131-151.
Tanay, A., Sharan, R. and Shamir, R. J.B. Discovering statistically significant biclusters in gene expression data. 2002. 18(suppl_1): p. S136-S144.
Toregas, C. and ReVelle, C. J.P.i.R.S., Optimal location under time or distance constraints. 1972. 28(1): p. 131-143.
Toregas, C., Swain, R., ReVelle, C., & Bergman, L. The location of emergency service facilities. Operations research. 1971: 19(6), 1363-1373.
Wang, H. and Lin, Z. A novel algorithm for counting all common subsequences. in 2007 IEEE International Conference on Granular Computing (GRC 2007). 2007. IEEE.
Wang, H., Wang, W., Yang, J., & Yu, P. S. Clustering by pattern similarity in large data sets. in Proceedings of the 2002 ACM SIGMOD international conference on Management of data. 2002. ACM.
Xue, Y., Liao, Z., Li, M., Luo, J., Kuang, Q., Hu, X., & Li, T. A new approach for mining order-preserving submatrices based on all common subsequences. Computational and mathematical methods in medicine. 2015. 2015.‏.
Yang, F., Hua, G., Inoue, H., & Shi, J. Two bi-objective optimization models for competitive location problems. International Journal of Information Technology & Decision Making. 2006: 5(03), 531-543.‏
Yang, J., Wang, W., Wang, H., & Yu, P. /spl delta/-clusters: capturing subspace correlation in a large data set. in Proceedings 18th international conference on data engineering. 2002. IEEE.
Yang, Q. and Wu, X. J.I.J.o.I.T. D. Making, 10 challenging problems in data mining research. 2006. 5(04): p. 597-604.