
This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Set cover problem" – news ·newspapers ·books ·scholar ·JSTOR(September 2025) (Learn how and when to remove this message) |
Theset cover problem is a classical question incombinatorics,computer science,operations research, andcomplexity theory.
Given aset of elements{1, 2, …,n}(henceforth referred to as theuniverse, specifying all possible elements under consideration) and a collection, referred to asS, of a givenm subsets whoseunion equals the universe, the set cover problem is to identify a smallest sub-collection ofS whose union equals the universe.
For example, consider the universe,U = {1, 2, 3, 4, 5} and the collection of setsS = { {1, 2, 3}, {2, 4}, {3, 4}, {4, 5} }. In this example,m is equal to 4, as there are four subsets that comprise this collection. The union ofS is equal toU. However, we can cover all elements with only two sets:{ {1, 2, 3}, {4, 5} }, see picture, but not with only one set. Therefore, the solution to the set cover problem for thisU andS has size 2.
More formally, given a universe and a family of subsets of, aset cover is a subfamily of sets whose union is.
The decision version of set covering isNP-complete. It is one ofKarp's 21 NP-complete problems shown to beNP-complete in 1972. The optimization/search version of set cover isNP-hard.[1] It is a problem "whose study has led to the development of fundamental techniques for the entire field" ofapproximation algorithms.[2]
In theweighted set cover problem, each set is assigned a positive weight (representing its cost), and the goal is to find a set cover with a smallest weight. The usual (unweighted) set cover corresponds to all sets having a weight of 1.
In thefractional set cover problem, it is allowed to select fractions of sets, rather than entire sets. A fractional set cover is an assignment of a fraction (a number in [0,1]) to each set in, such that for each elementx in the universe, the sum of fractions of sets that containx is at least 1. The goal is to find a fractional set cover in which the sum of fractions is as small as possible. Note that a (usual) set cover is equivalent to a fractional set cover in which all fractions are either 0 or 1; therefore, the size of the smallest fractional cover is at most the size of the smallest cover, but may be smaller. For example, consider the universeU = {1, 2, 3} and the collection of setsS = { {1, 2}, {2, 3}, {3, 1} }. The smallest set cover has a size of 2, e.g.{ {1, 2}, {2, 3} }. But there is a fractional set cover of size 1.5, in which a 0.5 fraction of each set is taken.
Theset cover problem can be formulated as the followinginteger linear program (ILP).[3]
| minimize | (minimize the number of sets) | ||
| subject to | for all | (cover every element of the universe) | |
| for all. | (every set is either in the set cover or not) |
For a more compact representation of the covering constraint, one can define anincidence matrix, where each row corresponds to an element and each column corresponds to a set, and if element e is in set s, and otherwise. Then, the covering constraint can be written as.
Weighted set cover is described by a program identical to the one given above, except that the objective function to minimize is, where is the weight of set.
Fractional set cover is described by a program identical to the one given above, except that can be non-integer, so the last constraint is replaced by.
This linear program belongs to the more general class of LPs forcovering problems, as all the coefficients in the objective function and both sides of the constraints are non-negative. Theintegrality gap of the ILP is at most (where is the size of the universe). It has been shown that itsrelaxation indeed gives a factor-approximation algorithm for the minimum set cover problem.[4] Seerandomized rounding#setcover for a detailed explanation.
The set cover problem is equivalent to the hitting set problem. A subset of is called ahitting set when for all (i.e., intersects or “hits” all subsets in). Thehitting set problem is to find a minimum hitting set for a given and.
To show that the problems are equivalent, for a universe of size and collection of sets of size, construct and. Then a set cover of is equivalent to a hitting set of where, and vice versa.
This equivalence can also be visualized by representing the problem as abipartite graph of vertices, with vertices on the left representing elements of, and vertices on the right representing elements of, and edges representing set membership (i.e., there is an edge between the-th vertex on the left and the-th vertex of the rightiff.). Then a set cover is a subset of right vertices such that each left vertex is adjacent to at least one member of, while a hitting set is a subset of left vertices such that each right vertex is adjacent to at least one member of. These definitions are exactly the same except thatleft andright are swapped. But there is nothing special about the sides in the bipartite graph; we could have put the elements of on the right side, and the elements of on the left side, creating a graph that is a mirror image of the one described above. This shows that set covers in the original graph are equivalent to hitting sets in the mirrored graph, and vice versa.
In the field ofcomputational geometry, a hitting set for a collection of geometrical objects is also called astabbing set orpiercing set.[5]
There is agreedy algorithm for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. This method can be implemented in time linear in the sum of sizes of the input sets, using abucket queue to prioritize the sets.[6] It achieves an approximation ratio of, where is the size of the set to be covered.[7] In other words, it finds a covering that may be times as large as the minimum one, where is the-thharmonic number:
This greedy algorithm actually achieves an approximation ratio of where is the maximum cardinality set of. Fordense instances, however, there exists a-approximation algorithm for every.[8]

There is a standard example on which the greedy algorithm achieves an approximation ratio of.The universe consists of elements. The set system consists of pairwise disjoint sets with sizes respectively, as well as two additional disjoint sets,each of which contains half of the elements from each. On this input, the greedy algorithm takes the sets, in that order, while the optimal solution consists only of and.An example of such an input for is pictured on the right.
Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms(seeInapproximability results below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly.[9]
If each element occurs in at mostf sets, then a solution can be found in polynomial time that approximates the optimum to within a factor off usingLP relaxation.
If the constraint is replaced by for allS in in the integer linear program shownabove, then it becomes a (non-integer) linear programL. The algorithm can be described as follows:
When refers to the size of the universe,Lund & Yannakakis (1994) showed that set covering cannot be approximated in polynomial time to within a factor of, unlessNP hasquasi-polynomial time algorithms.Feige (1998) improved this lower bound to under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm.Raz & Safra (1997) established a lower boundof, where is a certain constant, under the weaker assumption thatPNP.A similar result with a higher value of was recently proved byAlon, Moshkovitz & Safra (2006).Dinur & Steurer (2013) showed optimal inapproximability by proving that it cannot be approximated to unlessPNP.
In low-frequency systems,Dinur et al. (2003) proved it is NP-hard to approximate set cover to better than.If theUnique games conjecture is true, this can be improved to as proven byKhot & Regev (2008).
Trevisan (2001) proves that set cover instances with sets of size at most cannot be approximated to a factor better than unlessPNP, thus making the approximation of of the greedy algorithm essentially tight in this case.
This sectionneeds expansion. You can help byadding to it.(November 2017) |
Relaxing the integer linear program for weighted set cover statedabove, one may userandomized rounding to get an-factor approximation. Non weighted set cover can be adapted to the weighted case.[11]
This sectionneeds expansion. You can help byadding to it.(September 2023) |
{{citation}}: CS1 maint: location missing publisher (link)