Movatterモバイル変換

[0]ホーム

Jump to content

Set cover problem

Edit links

From Wikipedia, the free encyclopedia

Classical problem in combinatorics

Example of an instance of set cover problem.

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Set cover problem" – news ·newspapers ·books ·scholar ·JSTOR(September 2025) (Learn how and when to remove this message)

Theset cover problem is a classical question incombinatorics,computer science,operations research, andcomplexity theory.

Given aset of elements{1, 2, …,n}(henceforth referred to as theuniverse, specifying all possible elements under consideration) and a collection, referred to asS, of a givenm subsets whoseunion equals the universe, the set cover problem is to identify a smallest sub-collection ofS whose union equals the universe.

For example, consider the universe,U = {1, 2, 3, 4, 5} and the collection of setsS = { {1, 2, 3}, {2, 4}, {3, 4}, {4, 5} }. In this example,m is equal to 4, as there are four subsets that comprise this collection. The union ofS is equal toU. However, we can cover all elements with only two sets:{ {1, 2, 3}, {4, 5} }‍, see picture, but not with only one set. Therefore, the solution to the set cover problem for thisU andS has size 2.

More formally, given a universe ${\mathcal {U}}$ and a family ${\mathcal {S}}$ of subsets of ${\mathcal {U}}$ , aset cover is a subfamily ${\mathcal {C}}\subseteq {\mathcal {S}}$ of sets whose union is ${\mathcal {U}}$ .

In the set coverdecision problem, the input is a pair $({\mathcal {U}},{\mathcal {S}})$ and an integer $k {\displaystyle k}$ ; the question is whether there is a set cover of size $k {\displaystyle k}$ or less.
In the set coveroptimization problem, the input is a pair $({\mathcal {U}},{\mathcal {S}})$ , and the task is to find a set cover that uses the fewest sets.

The decision version of set covering isNP-complete. It is one ofKarp's 21 NP-complete problems shown to beNP-complete in 1972. The optimization/search version of set cover isNP-hard.^[1] It is a problem "whose study has led to the development of fundamental techniques for the entire field" ofapproximation algorithms.^[2]

Variants

[edit]

In theweighted set cover problem, each set is assigned a positive weight (representing its cost), and the goal is to find a set cover with a smallest weight. The usual (unweighted) set cover corresponds to all sets having a weight of 1.

In thefractional set cover problem, it is allowed to select fractions of sets, rather than entire sets. A fractional set cover is an assignment of a fraction (a number in [0,1]) to each set in ${\mathcal {S}}$ , such that for each elementx in the universe, the sum of fractions of sets that containx is at least 1. The goal is to find a fractional set cover in which the sum of fractions is as small as possible. Note that a (usual) set cover is equivalent to a fractional set cover in which all fractions are either 0 or 1; therefore, the size of the smallest fractional cover is at most the size of the smallest cover, but may be smaller. For example, consider the universeU = {1, 2, 3} and the collection of setsS = { {1, 2}, {2, 3}, {3, 1} }. The smallest set cover has a size of 2, e.g.{ {1, 2}, {2, 3} }. But there is a fractional set cover of size 1.5, in which a 0.5 fraction of each set is taken.

Covering/packing-problem pairs

Covering problems	Packing problems
Minimum set cover	Maximum set packing
Minimum edge cover	Maximum matching
Minimum vertex cover	Maximum independent set
Bin covering	Bin packing
Polygon covering	Rectangle packing

Linear program formulation

[edit]

Theset cover problem can be formulated as the followinginteger linear program (ILP).^[3]

minimize	$\sum _{s\in {\mathcal {S}}}x_{s}$		(minimize the number of sets)
subject to	$\sum _{s\colon e\in s}x_{s}\geqslant 1$	for all $e\in {\mathcal {U}}$	(cover every element of the universe)
	$x_{s}\in \{0,1\}$	for all $s\in {\mathcal {S}}$ .	(every set is either in the set cover or not)

For a more compact representation of the covering constraint, one can define anincidence matrix $A {\displaystyle A}$ , where each row corresponds to an element and each column corresponds to a set, and $A_{e,s}=1$ if element e is in set s, and $A_{e,s}=0$ otherwise. Then, the covering constraint can be written as $Ax\geqslant 1$ .

Weighted set cover is described by a program identical to the one given above, except that the objective function to minimize is $\sum _{s\in {\mathcal {S}}}w_{s}x_{s}$ , where $w_{s}$ is the weight of set $s\in {\mathcal {S}}$ .

Fractional set cover is described by a program identical to the one given above, except that $x_{s}$ can be non-integer, so the last constraint is replaced by $0\leq x_{s}\leq 1$ .

This linear program belongs to the more general class of LPs forcovering problems, as all the coefficients in the objective function and both sides of the constraints are non-negative. Theintegrality gap of the ILP is at most $\scriptstyle \log n$ (where $\scriptstyle n$ is the size of the universe). It has been shown that itsrelaxation indeed gives a factor- $\scriptstyle \log n$ approximation algorithm for the minimum set cover problem.^[4] Seerandomized rounding#setcover for a detailed explanation.

Hitting set formulation

[edit]

The set cover problem is equivalent to the hitting set problem. A subset $H {\displaystyle H}$ of $U {\displaystyle U}$ is called ahitting set when $H\cap S_{j}\neq \emptyset$ for all $1\leq j\leq m$ (i.e., $H {\displaystyle H}$ intersects or “hits” all subsets in $S {\displaystyle S}$ ). Thehitting set problem is to find a minimum hitting set $H {\displaystyle H}$ for a given $U {\displaystyle U}$ and $S {\displaystyle S}$ .

To show that the problems are equivalent, for a universe $U {\displaystyle U}$ of size $n {\displaystyle n}$ and collection of sets $S {\displaystyle S}$ of size $m {\displaystyle m}$ , construct $U'=\{1,2,\ldots ,m\}$ and $S'_{i}=\{j\mid i\in S_{j}\}$ . Then a set cover $C {\displaystyle C}$ of $S {\displaystyle S}$ is equivalent to a hitting set $H^{'} {\displaystyle H'}$ of $U^{'} {\displaystyle U'}$ where $S_{j}\in C\iff j\in H'$ , and vice versa.

This equivalence can also be visualized by representing the problem as abipartite graph of $n+m$ vertices, with $n {\displaystyle n}$ vertices on the left representing elements of $U {\displaystyle U}$ , and $m {\displaystyle m}$ vertices on the right representing elements of $S {\displaystyle S}$ , and edges representing set membership (i.e., there is an edge between the $i {\displaystyle i}$ -th vertex on the left and the $j {\displaystyle j}$ -th vertex of the rightiff. $i\in S_{j}$ ). Then a set cover is a subset $C {\displaystyle C}$ of right vertices such that each left vertex is adjacent to at least one member of $C {\displaystyle C}$ , while a hitting set is a subset $H {\displaystyle H}$ of left vertices such that each right vertex is adjacent to at least one member of $H {\displaystyle H}$ . These definitions are exactly the same except thatleft andright are swapped. But there is nothing special about the sides in the bipartite graph; we could have put the elements of $U {\displaystyle U}$ on the right side, and the elements of $S {\displaystyle S}$ on the left side, creating a graph that is a mirror image of the one described above. This shows that set covers in the original graph are equivalent to hitting sets in the mirrored graph, and vice versa.

In the field ofcomputational geometry, a hitting set for a collection of geometrical objects is also called astabbing set orpiercing set.^[5]

Greedy algorithm

[edit]

There is agreedy algorithm for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. This method can be implemented in time linear in the sum of sizes of the input sets, using abucket queue to prioritize the sets.^[6] It achieves an approximation ratio of $H(s)$ , where $s {\displaystyle s}$ is the size of the set to be covered.^[7] In other words, it finds a covering that may be $H(n)$ times as large as the minimum one, where $H(n)$ is the $n {\displaystyle n}$ -thharmonic number: $H(n)=\sum _{k=1}^{n}{\frac {1}{k}}\leq \ln {n}+1$

This greedy algorithm actually achieves an approximation ratio of $H(s^{\prime })$ where $s^{\prime }$ is the maximum cardinality set of $S {\displaystyle S}$ . For $\delta -$ dense instances, however, there exists a $c\ln {m}$ -approximation algorithm for every $c>0$ .^[8]

Tight example for thegreedy algorithm with k=3

There is a standard example on which the greedy algorithm achieves an approximation ratio of $\log _{2}(n)/2$ .The universe consists of $n=2^{(k+1)}-2$ elements. The set system consists of $k {\displaystyle k}$ pairwise disjoint sets $S_{1},\ldots ,S_{k}$ with sizes $2,4,8,\ldots ,2^{k}$ respectively, as well as two additional disjoint sets $T_{0},T_{1}$ ,each of which contains half of the elements from each $S_{i}$ . On this input, the greedy algorithm takes the sets $S_{k},\ldots ,S_{1}$ , in that order, while the optimal solution consists only of $T_{0}$ and $T_{1}$ .An example of such an input for $k=3$ is pictured on the right.

Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms(seeInapproximability results below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly $\ln {n}-\ln {\ln {n}}+\Theta (1)$ .^[9]

Low-frequency systems

[edit]

If each element occurs in at mostf sets, then a solution can be found in polynomial time that approximates the optimum to within a factor off usingLP relaxation.

If the constraint $x_{S}\in \{0,1\}$ is replaced by $x_{S}\geq 0$ for allS in ${\mathcal {S}}$ in the integer linear program shownabove, then it becomes a (non-integer) linear programL. The algorithm can be described as follows:

Find an optimal solutionO for the programL using some polynomial-time method of solving linear programs.
Pick all setsS for which the corresponding variablex_S has value at least 1/f in the solutionO.^[10]

Inapproximability results

[edit]

When $n {\displaystyle n}$ refers to the size of the universe,Lund & Yannakakis (1994) showed that set covering cannot be approximated in polynomial time to within a factor of ${\tfrac {1}{2}}\log _{2}{n}\approx 0.72\ln {n}$ , unlessNP hasquasi-polynomial time algorithms.Feige (1998) improved this lower bound to ${\bigl (}1-o(1){\bigr )}\cdot \ln {n}$ under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm.Raz & Safra (1997) established a lower boundof $c\cdot \ln {n}$ , where $c {\displaystyle c}$ is a certain constant, under the weaker assumption thatP $\not =$ NP.A similar result with a higher value of $c {\displaystyle c}$ was recently proved byAlon, Moshkovitz & Safra (2006).Dinur & Steurer (2013) showed optimal inapproximability by proving that it cannot be approximated to ${\bigl (}1-o(1){\bigr )}\cdot \ln {n}$ unlessP $=$ NP.

In low-frequency systems,Dinur et al. (2003) proved it is NP-hard to approximate set cover to better than $f-1-\epsilon$ .If theUnique games conjecture is true, this can be improved to $f-\epsilon$ as proven byKhot & Regev (2008).

Trevisan (2001) proves that set cover instances with sets of size at most $\Delta$ cannot be approximated to a factor better than $\ln \Delta -O(\ln \ln \Delta )$ unlessP $=$ NP, thus making the approximation of $\ln \Delta +1$ of the greedy algorithm essentially tight in this case.

Weighted set cover

[edit]

This sectionneeds expansion. You can help byadding to it.(November 2017)

Relaxing the integer linear program for weighted set cover statedabove, one may userandomized rounding to get an $O(\log n)$ -factor approximation. Non weighted set cover can be adapted to the weighted case.^[11]