CLOSURE SYSTEM
K. ADARICHEVA, J. B. NATION, AND R. RAND
Abstract. Closure system on a finite set is a unifying concept in logic pro- gramming, relational data bases and knowledge systems. It can also be pre- sented in the terms of finite lattices, and the tools of economic description of a finite lattice have long existed in lattice theory. We present this approach by describing the so-calledD-basis and introducing the concept ofordered direct basis of an implicational system. A direct basis of a closure operator, or an implicational system, is a set of implications that allows one to compute the closure of an arbitrary set by a single iteration. This property is preserved by theD-basis at the cost of following a prescribed order in which implications will be attended. In particular, using an ordered direct basis allows to opti- mize theforward chaining procedurein logic programming that uses the Horn fragment of propositional logic. One can extract theD-basis from any direct unit basis Σ in time polynomial in the sizes(Σ), and it takes only linear time of the cardinality of theD-basis to put it into a proper order. We produce examples of closure systems on a 6-element set, for which the canonical basis of Duquenne and Guigues is not ordered direct.
1. Introduction
In K. Bertet and B. Monjardet [5], it is shown that five implicational bases for a closure operator on a finite set, found in various contexts in the literature, are actually the same. The goal of this paper is to demonstrate that standard lattice-theoretic results about the “most economical way” to describe the structure of a finite lattice may be transformed into a basis for a closure system naturally associated with that lattice.
The coding of a finite lattice in the form of a so-called OD-graph was first suggested in [16]. We will call the basis directly following from thisOD-graph aD- basis, since it is closely associated with aD-relation on the set of join-irreducibles of a lattice (not necessarily finite) that was crucial in the studies of free and lower bounded lattices, see [9]. The definition and the proof that D-basis does define a given closure system are given in section 4.
TheD-basis is a subset of a so-calleddependence relation basis (Definition 6 in [5]). Thus, it is also a subset of thecanonical direct unit basis that unifies the five bases discussed in [5]. In section 5, we give an example to demonstrate that the reverse inclusion does not hold, thus showing that this newly introducedD-basis is generally shorter than the existing ones.
2010Mathematics Subject Classification. 06A15, 08A70, 06B99.
Key words and phrases. Closure operator, system of implications, lattice of closed sets, Horn formula, Horn Boolean function, forward chaining, direct basis, canonical basis.
The first author was partially supported by AWM-NSF Mentor Travel grant N0839954.
1
arXiv:1110.5805v3 [math.CO] 13 May 2012
Recall that the main desirable feature of bases from [5] is that they be direct, which means that the computation of the closure of any subset can be done by attending each implication from the basis only once. This makes the computation of closures a one-iteration process.
While the D-basis is not direct in this meaning of this term, the closures can still be computed in a single iteration of the basis, provided the basis was put in a specific order prior to computation. Moreover, there is a simple and effective linear time algorithm for ordering a D-basis appropriately. Thus, applying the D-basis can be compared to the iteration known in artificial intelligence as the forward chaining algorithm, see for example [12].
We introduce the definition of ordered iteration and ordered direct basis in sec- tion 6, where we also prove that the D-basis is ordered direct and discuss the algorithmic aspects of ordering it. The further directions of optimization of D-basis are outlined in section 8, where we also introduce the notion of an ordered direct sequence built from a given basis of a closure system.
In section 9, we also discuss the so-called E-relation, introduced in [9], which leads to the definition of theE-basis in closure systemswithoutD-cycles. In general, the implications written from the E-relation do not necessarily form a basis of a closure system, but in closure systems without D-cycles, the E-basis is ordered direct, is contained in theD-basis, and often shorter than theD-basis. We discuss a polynomial time algorithm for ordering theE-basis.
We explore the connections betweenD-basis,E-basis and the so-calledcanonical basis introduced by Duquenne and Guigues in [11]. While the canonical basis has the minimal number of implications among all the bases of a closure system, it does not have the feature ofD-basis orE-basis discussed in this paper, namely, it cannot be turned into an ordered direct basis. Section 10 of our paper presents examples of closure systems on a 6-element set, for which the canonical basis cannot be ordered.
As a result, the time required for one iteration of D-basis wins over at least two iterations of the canonical basis. Further polynomial-time optimizations of both D-basis and the canonical basis are discussed.
Section 7 is devoted to discussion and testing the forward chaining algorithm in comparison to the ordered direct basis algorithm. Section 11 provides test results comparing the performance of theD-basis with the Duquenne-Guigues canonical basis and canonical direct unit basis.
The next two sections contain the required definitions and establish connections between finite lattices, closure operators, implicational systems, Horn formulas and Horn Boolean functions. The reader may consult the survey [4] for various aspects of closure systems on finite sets.
2. Lattices and closure operators
By alattice, one means an algebra with two binary operations∧,∨, called meet and join, respectively. Both operations are idempotent and symmetric and are connected by absorbtion laws: x∨(x∧y) = xand x∧(x∨y) = x. These laws allow us to define a partially order on the base set of the lattice: x6yiffx∧y=x (which is equivalent to x∨y =y). Vice versa, every partially ordered set, where every two elements have a least upper bound and a greatest lower bound, is, in effect, a lattice. Indeed, in this case the operation ∨ can be defined as the least upper bound, and ∧as the greatest lower bound of two elements in the poset. A
lattice is finite when the base set of this algebra is finite. The symbols V ,W
are used when more than two elements meet or join. We will use the notation 0 for the least element of a lattice, and 1 for its greatest element. Ifa6b in latticeL, then we denote by [a, b] the interval inL, i.e., the set of allcsatisfyinga6c6b.
Recall now the standard connection between a closure operator on a set and the lattice of its closed sets. Given a non-empty setS and the setP(S) = 2S of all its subsets,a closure operator is a mapφ:P(S)→P(S) that satisfies the following, for allX, Y ∈P(S):
(1) increasing: X⊆φ(X);
(2) isotone: X⊆Y impliesφ(X)⊆φ(Y);
(3) idempotent: φ(φ(X)) =φ(X).
It would be convenient for us to refer to the pair hS, φiof a set S and a closure operator on it as aclosure system.
A subsetX ⊆S is calledclosed ifφ(X) =X. The collection of closed subsets of closure operatorφ onS forms a lattice, which is usually called theclosure lattice of the closure systemhS, φi. This paper deals with only finite closure systems and finite lattices.
Conversely, we can associate with every finite latticeLa particular closure system hS, φiin such a way thatLis isomorphic to a closure lattice of that closure system.
Consider J(L) ⊆ L, a subset of join-irreducible elements. An element j ∈ L is called join-irreducible, if j6= 0, and j =a∨b impliesa=j or b=j. We define a closure system withS=J(L) and the following closure operator:
φ(X) = [0,_
X]∩J(L)
It is straightforward to check that the closure lattice ofφis isomorphic toL.
Example 1. Consider a simple example illustrating a closure system built from the latticeL={0, a, b, c,1}, for which0< a < b <1,0< c <1,a∨c=b∨c= 1 and a∧c=b∧c= 0. ThenS =J(L) ={a, b, c}. The closed subsets are[0, x]∩J(L) for x∈ L, which are ∅, {a}, {c},{a, b} and {a, b, c}. Knowing all closed subsets, one can define a closure of X, or φ(X), as the smallest closed set containing X.
For example, φ({b}) ={a, b}.
There are infinitely many sets and closure operators whose closure lattice is isomorphic to a givenL. On the other hand, the one just described is the unique one with two additional properties:
(1) φ(∅) =∅;
(2) φ({i})\ {i} is closed, for everyi∈S.
Condition (2) just says that eachφ({i}) is join irreducible. Note that (1) is a special case of (2), and that (2) implies the property
(3) φ({i}) =φ({j}) impliesi=j, for anyi, j∈S.
Note that S
i∈Xφ({i}) ⊆ φ(X), but the inverse inclusion does not necessarily hold. In Example 1, for instance, φ({a})∪φ({c})⊂φ({a, c}), sinceb belongs to the right side and not to the left side.
We will call a closure system with properties (1), (2) above a standard closure system. Closure systems with (2) are called (T12) closure spaces in Wild [17].
A closure system satisfying property (3) is said to be reduced. Note that (3) implies |φ(∅)| 6 1. Reduced closure systems correspond to a representation of a
latticeLas a closure system on a setSwithJ(L)⊆S⊆Landφ(X) = [0,W X]∩S.
A natural example is the set of principal congruences in the congruence lattice of a finite algebra. Every standard closure system is reduced, and reduced closure systems form a useful intermediate ground between standard and general systems.
It is straightforward to verify that the standard system is characterized by the property that the setS is of the smallest possible size. In other words, one cannot reduce S to define an equivalent closure system. On the other hand, the reduced systems might have excessive elements inS.
Example 2. Consider again lattice L = {0, a, b, c,1} from Example 1. It will represent the closure lattice on S1 ={a, b, c, d}, where the closed sets are ∅, {a}, {c},{a, b} and{a, b, c, d}. Thus, in this representationJ(L)⊂S1, and property (2) fails: φ({d})\ {d}={a, b, c} is not closed. On the other hand, property (3) holds, thus, it is a reduced closure system. Apparently,S1 can be reduce by element d, to get an equivalent representation of Example 1.
If the closure system hS, φi is not reduced, one can modify it to produce an equivalent one that is reduced. Moreover, there is an effective algorithm for doing so. Thus, for all practical purposes, one can work with a reduced closure system hU, µireplacing a given onehS, φi. Slightly more effort yields an equivalent standard closure systemhV, νi. The transition is described as follows.
Ifφ(∅) =A⊆SinhS, φi, then defineT =S\A, and redefine a closure operator:
τ(Y) = φ(Y)\A, for all Y ⊆T. The closure system hT, τi satisfies property (1).
As (1) is required for a standard closure system, but not for a reduced system, this step may be omitted if only the latter is sought.
Next define an equivalence relation≈onT byx≈y if and only if τ(x) =τ(y).
Then factor out≈, lettingU =T /≈andµ(Y) =τ(Y)/≈forY ⊆U. Alternately, we could defineU to be a set of representatives forT /≈andµto be the restriction ofτ. Either way, one easily checks thatµis a well-defined closure operator onU, and that the closure lattice ofhU, µiis isomorphic to that ofhS, φi. At this point, hU, µiis reduced. Moreover, we can recover the original systemhS, φiby expanding the equivalence classes and adding back inφ(∅). If desired, we can now continue to produce an equivalent standard closure system.
Let V = {u ∈ U : µ({u})\ {u} is closed}, that is, u /∈ µ(µ({u})\ {u}), and for Z ⊆ V letν(Z) = µ(Z)∩V. It is straightforward to verify that hV, νi is a closure system satisfying (1) and (2), and that the lattice of closed sets ofhV, νiis isomorphic to that ofhU, νi.
For the sequel, we will consider primarily reduced closure systems. Given an arbitrary closure system, not necessarily reduced, the above reduction can be con- sidered as a setup process to allow us to apply the D-basis and related methods.
3. The bases of closure systems, Horn formulas and Horn Boolean functions
Ify ∈φ(X), then this relation between an elementy ∈S and a subset X ⊆S in a closure system can be written in the form of implication: X →y. Thus, the closure systemhS, φican be replaced by the set of implications:
Σφ={X →y:y∈S, X ⊆S and y∈φ(X)}
Conversely, any set of implications Σ defines a closure system: the closed sets are exactly subsetsY ⊆S that respect the implications from Σ, i.e., ifX →y is in Σ, andX ⊆Y, theny∈Y.
It is convenient to define an implication X → y as any ordered pair (X, y), X ⊆ S, y ∈ S, especially having in mind its interpretation as a propositional formula, see this section two paragraphs below. On the other hand, from the point of view of closure systems, any single implication X → x, with x ∈ X, defines a trivial closure system, where all subsets of S are closed. If such implication is present in the set of implications Σ, then it can be removed without any change to the family of closed sets that Σ defines. We will assume throughout the paper that implicationsX →x, wherex∈X, are not included in the set of implications defining closure systems.
Two sets of implications Σ and Σ0on the same setSare calledequivalent, if they define the same closure system onS. The termbasisis used for a set of implications Σ0satisfying some minimality condition; thus there may be different types of bases.
Note that, in general, one can consider implications of the formX →Y, whereY is not necessarily a one-element subset ofS. Following [5], we will call basis Σ aunit implicational basis if|Y|= 1 for all implications X →Y in Σ. We will mostly be concerned with unit implicational bases, except for the discussion of the canonical basis of Duquenne-Guigues and its comparison with D-basis andE-basis. Given any unit basis, we can always collapse the implications with the same premise into one with all conclusions combined into a single set. This will be called anaggregated basis.
For a set of implications Σ = {X1 → Y1, . . . , Xm → Ym}, define the size by s(Σ) =Pm
j=1(|Xj|+|Yj|). This is one convenient measure of the complexity of an implicational system.
In general, implicationsX →y, whereX ⊆S andy∈S, can be treated as the formulas of propositional logic over the set of variablesS, equivalent toy∨W
x∈X¬x.
Formulae of this form are also called definite Horn clauses. More generally, Horn clauses are disjunctions of negations of several literals and at most one positive literal. The presence of a positive literal makes a Horn clause definite. A Horn formula is a conjunction of Horn clauses.
What is called amodel of a definite Horn clause in logic programming literature corresponds to a closed set of the closure operator defined by this clause. Indeed, by the definition, a model of any formula is simply a tuplem∈2S of zeros and ones assigned to literals fromS, such that the formula is true (=1) on this assignment.
For the definite Horn clauseX→y,mcorresponds to a subsetY ofSthat is closed for a closure operator onS defined byX →y. In fact,m is just the characteristic function ofY.
There is also a direct correspondence between Horn formulas and Horn Boolean functions: a Boolean functionf :{0,1}n→ {0,1}is called a (pureor definite)Horn function, if it has some CNF representation given by a (definite) Horn formula Σ.
The dual definition is sometimes used in the literature, so that a Horn function is given by some formula in DNF, whose negation is a Horn formula [6]. Using either definition, one can translate many results on Horn Boolean functions to the language of closure operators, see more details in [5].
Consider a set Σ of Horn clauses over some finite set of literalsS={x1, . . . , xn}.
If some Horn clauseαin Σ is not definite, i.e., is of the formW
x∈X¬x,X⊂S, and
it does not use all literals fromS, then we could define the set of definite clauses Σα = {X → y : y ∈ S\X}. It is easy to observe that the set of models of Σα
consists of all models ofW
x∈X¬xand one additional model, which is a tuple of all ones, representing the setS itself. If some clauseβ∈Σ is not definite and uses all the literals from S, then we define Σβ =∅ (another possibility, x1 →x1). Again, the set of models Σβ, i.e., all tuples of zeros and ones, extends the models ofβ by single tuple of all ones. It follows that the set of definite clauses Σ0, where each non-definite clauseαfrom Σ is replaced by a set of clauses Σα, has the set of models that extends the set of models of Σ by a single tuple of all ones. This includes the case when Σ has no models, i.e., when it isinconsistent.
This observation allows us to reduce the solution of various questions about sets of Horn clauses to sets ofdefiniteHorn clauses. Thus, it emphasizes the importance of the study of closure operators onS.
One of the important questions in logic programming is whether one clause φ is aconsequence of the set (or conjunction) of clauses Σ. Denoted by Σ|=φ, this means that every model of Σ is also a model of φ. If φ and formulas in Σ are Horn clauses, then, translating this question to the language of closure systems, one reduces it to checking whether every closed set of a closure system defined by Σ respectsφ.
4. The D-basis
In this section we are going to define a basis that translates to the language of closure systems the defining relations of a finite lattice developed in the lattice theory framework. One can consult [9] for the corresponding notion of a minimal cover andD-relation used in the theory of free lattices and lower bounded lattices.
Given areduced closure systemhS, φi, let us define two auxiliary relations. The first relation is between the subsets of S: we write X Y, if for every x ∈ X there isy∈Y satisfyingx∈φ(y). In Example 1, for instance, we have{a} {b}, {a, c} {b, c} and{c} {a, c}. Note thatX ⊆Y impliesX Y. We also write X ∼Y, if X Y and Y X. This is true forX ={a, b, c} and Y ={b, c} in Example 1.
Several observations are easy.
Lemma 3. The relationis a quasi-order, and thus∼is an equivalence relation onP(S).
We will denote a∼-equivalence class containingX by [X]. Note that for any two membersX, Y ∈[X], we haveφ(X) =φ(Y). Indeed,X Y impliesX ⊆φ(Y) andφ(X)⊆φ(Y). Inverse inclusion follows fromY X.
There is a natural order6c on∼-classes: [X]6c[Y] ifX Y.
Lemma 4. The relation6cis a partial order on the set of∼-equivalence classes.
Each class [X] is ordered itself with respect to set containment.
In Example 1, we have that {a, b, c} ∼ {b, c}, and no more subsets are ∼- equivalent to{a, b, c}. Thus, [{b, c}] consists of two subsets, and{b, c} ⊆ {a, b, c}is the minimal (with respect to the order of containment) subset in that equivalence class of∼. Also{a, c} {b, c}, whence [{a, c}]6c[{b, c}].
Lemma 5. IfhS, φiis reduced, then each equivalence class[X] has a unique min- imal element with respect to the containment order.
Proof. Let us assume that there are two minimal members X1 and X2 in [X].
Without loss of generality we assume that there isx∈X1\X2. SinceX1X2 X1, we havex ∈φ(x2) and x2 ∈ φ(x1), for some x1 ∈X1, x2 ∈ X2. We cannot have x =x1, because, if so, thenφ(x) = φ(x2), which impliesx =x2, since our closure system is reduced. This would contradict to the choice ofx.
Thus,x6=x1, andx∈φ(x1). But then we can reduceX1toX0=X1\x⊂X1, which is still a member of [X] sinceX1X0 ⊂X1. This contradicts the minimality
ofX1 in [X].
The second relation we want to introduce in this section is between an element x∈S and a subset X ⊆S, which will be called a cover of x. (In lattice theory, the terminologynontrivial join cover is used.) We will write x / X, if x∈φ(X)\ S
x0∈Xφ(x0). This notion is illustrated in Example 1 by b /{a, c}. Note that it is not true that a /{b, c}, because a < b, so that a ∈ φ(b) for the corresponding standard closure operator.
We will call a subsetY ⊆Saminimal cover of an elementx∈S, ifY is a cover ofx, and for every other coverZ ofx,ZY impliesY ⊆Z. So a minimal cover of xis a coverY that is minimal with respect to the quasi-order, and minimal with respect to set containment within its∼-equivalence class [Y], as per Lemma 5.
To illustrate this notion, let us slightly modify Example 1. Rename element 0 by dand add a new 0 element: 0< d, resulting in a latticeL1withJ(L1) =J(L)∪{d}.
We will haveY ={a, c}as a minimal cover forb. Indeed, the only other cover for bisZ ={a, c, d}, for which we haveZ Y andY ⊆Z.
Lemma 6. For a reduced closure system, if x / X, then there exists Y such that x / Y,Y X andY is a minimal cover forx. In other words, every cover can be -reduced to a minimal cover.
Proof. ConsiderPx={[X] :x / X}, a sub-poset in the6c poset of∼-classes. If it is not empty, choose a minimal element in this sub-poset, say [Y], and letY be the unique minimal element in [Y] with respect to containment, which exists due to Lemma 5. Then Y X and x / Y. It remains to show that, for every other coverZ of x, Z Y implies Y ⊆Z. Indeed, since Z Y, we have [Z]6c [Y].
But [Y] is the minimal element in Px, hence, [Z] = [Y]. It follows that Y ⊆ Z, sinceY is the minimal element of [Y] with respect to containment order.
We finish this section by introducing theD-basis of a reduced closure system.
Definition 7. Given a reduced closure systemhS, φi, we define the D-basisΣD as a union of two subsets of implications:
(1) {y→x:x∈φ(y)\y, y∈S};
(2) {X →x:X is a minimal cover forx}.
Part (1) in the definition of theD-basis will also be called thebinary part of the basis, due to the fact that both the premise and the conclusion of implications in (1) are one-element subsets ofS.
For the closure systemhJ(L), φiassociated with the latticeLin Example 1, the D-basis consists of two implications: b→aand{a, c} →b.
Lemma 8. ΣD generateshS, φi.
Proof. We need to show that, for anyx∈S andX ⊆S such thatx∈φ(X)\X, the implicationX→xfollows from implications in ΣD.
Ifx∈φ({x0}), for somex0 ∈X, x0 6=x, thenX →xfollows fromx0 →xthat is in ΣD. So assume that x 6∈ φ({x0}), for any x0 ∈X. Then x / X. According to Lemma 6, there exists Y X such that x / Y, and Y is a minimal cover for x. ThenY →xis in ΣD. Besides, for each y ∈Y \X there exists xy ∈X such that y ∈ φ({xy}). Therefore, xy → y is in ΣD as well. Evidently, X → x is a
consequence ofY →xand{xy →y:y∈Y}.
5. Comparison of the D-basis and the dependence relation basis One of the bases discussed in [5] is thedependence relation basis. For a closure systemhS, φi, not necessarily reduced, the dependence relation basis is
Σδ ={X →y:y∈φ(X)\X andy /∈φ(Z) for allZ⊂X}.
SinceZ ⊆X impliesZ X, a minimal cover (as defined above) is automatically minimal with respect to containment. Thus we have the following connection.
Lemma 9. For a reduced closure system,ΣD⊆Σδ.
For later reference, the dependence relation δ from Monjardet [15] can be de- scribed byyδxwheneverx∈X for someX →yin Σδ.
In the next example and in the sequel, whenever there is no confusion, we will omit the braces in notations of subsets of some set S: {x},{a, b, c}, etc. will be denoted simplyx,abc, etc.
1 2 3 4
5
1
Figure 1. Example 10
Example 10. This example is based on Example 5 from[5]. Consider the closure system on S ={1,2,3,4,5} with the set of closed subsets F ={∅,1,2,3,4,12,13, 234,45,12345}. Then Σδ ={5→4,23→4,24→3,34→2,14→2,14→3,14→ 5,25→1,35→1, 15→2,35→2,15→3,25→3,123→5}.
All implications except 5 → 4 are of the form X → x, where x / X. On the other hand, not all covers X are minimal covers ofx. We can check that each of implications 15→2,35→2,15→3,25→3 does not represent a minimal cover.
For example, 2/15, but 14 15 and 2/14 is the minimal cover. In particular, D-basis consists of all implications from Σδ except the four indicated: ΣD={5→ 4,23→4,24→3,34→2,14→2,14→3,14→5,25→1,35→1,123→5}.
As this example demonstrates, theD-basis can be obtained from Σδ simply by removing some unnecessary implications. It turns out that the same can be done
for the big range of bases called direct unit bases. Moreover, it can be done in polynomial time in the size of the given basis. See Proposition 17 in the next section.
6. Direct basis versus ordered direct basis
The bases discussed in Bertet and Monjardet [5] are, in general, redundant: a proper subset of such a basis would generate the same closure system. For example, as we saw in the previous section, Σδfrom Example 5 was reduced to a smaller basis ΣD. Example 30 shows that the D-basis can also be redundant; see Remark 31.
While the desire to keep the basis as small as possible might be a plausible task, there is another property of a basis that could be better appreciated in a programming setting. Here we recall the definition of adirect basis.
If Σ is some set of implications, then letπΣ(X) =X∪S{B :A⊆X and (A→ B)∈Σ}. In order to obtainφΣ(X), for any X ⊆S, one would normally need to repeat several iterations ofπ: φ(X) =π(X)∪π2(X)∪π3(X). . ..
The bases for which one can obtain the closure of any setX performing only one iteration, i.e.,φ(X) =π(X), are called direct.
It follows from Theorem 15 of [5] that the dependency relation basis Σδ is direct.
Moreover, this basis is direct-optimal, meaning that no other direct basis for the same closure system can be found of smaller total size. (Thetotal size t(Σ) is the sum of the cardinalities of all sets participating in its implications. This will be less thans(Σ) if some sets are repeated.) In particular, any reduction of Σδwill cease to be direct. Thus, there is a apparent trade-off between the number of implications in the basis and the number of iterations one needs to compute the closures of subsets.
The goal of this section to implement a different approach to the concept of iteration. That would allow the same number of programming steps as with the iteration ofπ, while allowing us to reduce the bases to a smaller size.
Definition 11. Suppose the set of implicationsΣis equipped with some linear order
<, or equivalently, the implications are indexed asΣ = {s1, s2, . . . , sn}. Define a mapping ρΣ : P(S) → P(S) associated with this ordering as follows. For any X ⊆S, letX0=X. IfXk is computed and implication sk+1 isA→B, then
Xk+1=
Xk∪B, if A⊆Xk, Xk, otherwise.
Finally,ρΣ(X) =Xn. We will callρΣan ordered iterationofΣ.
Apparently, πΣ(X) ⊆ ρΣ(X), because all implications from Σ are applied to the original subsetX, while they are applied to potentially bigger subsets Xk in the construction for ρΣ(X). We note though that assuming the order on Σ is established, the number of computational steps to produce ρΣ(X) is the same as forπΣ(X).
Definition 12. The set of implications with some linear ordering on it,hΣ, <i, is called an ordered direct basis, if, with respect to this ordering,φΣ(X) =ρΣ(X)for allX ⊆S.
Our next goal is to demonstrate that ΣD is, in fact, an ordered direct basis.
Moreover, it does not take much computational effort to impose a proper ordering on ΣD.
Theorem 13. Let ΣD be the D-basis for a reduced closure system. Let< be any linear ordering on ΣD such that all implications of the form y → z precede all implications of the form X → x, where X is a minimal cover of x. Then, with respect to this ordering,ΣD is an ordered direct basis.
Proof. Suppose thatX⊆S andb∈φ(X)\X. We want to show thatbwill appear in one of theXk in the sequence that leads to ρ(X).
Ifb∈φ({a})\ {a} for somea∈X, then bwill appear in someXk, whena→b from ΣD is applied. So now assume thatb /∈φ({a}) for everya∈X. Thenb / X and, according to Lemma 6, there existsY X such thatb / Y andY is a minimal cover for y. It follows that for anyy ∈Y there exists a∈ X such thaty ∈φ(a).
All implications a→ y will be applied prior to any application with the minimal cover. It follows that by the time the implication sk, say Y →b, is tested against Xk−1, we will haveY ⊆Xk−1. Hence,Xk=Xk−1∪ {b}.
Corollary 14. D-basis is also ordered direct in its aggregated form.
Indeed, it follows from the fact that the only restriction on the order of the D-basis is to have its binary part prior to the rest of the basis.
Corollary 15. If ΣD = {s1, . . . , sm} is the D-basis of a reduced implicational systemΣ, then it requires timeO(m) to turn it into an ordered direct basis ofΣ.
Example 16. Consider the closure system withS={1,2,3,4,5,6}and the family of closed sets F = {1,12,13,4,45,134,136,1362,1346,13456,123456}. Then the D-basis of this system is ΣD = {5 → 4,14 → 3,23 → 6,6 → 3,15 → 6,24 → 6,24→5,3→1,2→1}. According to Theorem 13, a proper ordering that turns this basis into ordered direct can be defined, for example, as: (1)5→4, (2)6→3, (3)3→1, (4)2→1, (5)14→3, (6)23→6, (7)15→6, (8)24→6, (9)24→5.
1 2 3 4 5
Figure 1. Example 9
1 2
3 4
5 6
Figure 2. Example 16
1
Figure 2. Example 16
7. Processing of ordered basis versus forward chaining algorithm Theforward chaining algorithm was originally introduced in 1984 by W. Dowling and J.H. Gallier in the context of checking the satisfiability of Horn formulae [8].
In 1992, H. Mannila and K.J. R¨aih¨a [14] introduced theLINCLOSURE algorithm, which applies the same approach to expanding functional dependencies in Database Systems. In this section, we will look at the efficiency of this approach in comparison with afolklorealgorithm to computing closures for theD-Basis, an approach that can be generalized to any direct or ordered basis.
We will assume that the base set isS={x1, . . . , xn}, which can be interpreted as propositional variables, and the closure system is given by a unit basis Σ ={A1→ b1, . . . , Am→bm}.
The forward chaining procedure requires a pre-processing setup, during which it constructs three data structures: ClauseListi = {Aj : xi ∈ Aj} for i 6 n, Propositionsj =|Aj| andConsequentj ={bj} for j 6m, along with subset True
⊆S thought of as an input set, whose closure needs to be computed.
When forward chaining computes the closure ofTrue, for each newxi∈True, for eachAj∈ClauseListi, it decrements the value of Propositionsj by one. Whenever Propositionsj = 0,Consequentj is added to the setTrue.
Since every entry of Propositions will, in the worst case, be reduced to zero, the number of steps in computing the closure is bounded by the sizes(Σ), i.e., the combined length of the implications in the basis. Including the pre-processing steps, the forward chaining algorithm should requireO(s(Σ)) operations to compute the closure. If the closures of multiple sets are to be performed, of course, the setup steps can be abbreviated: only Propositions and True need to be updated for subsequent runs.
As noted in [18], forward chaining, while efficient in the worst case, generally underperforms the folklore algorithm of simply checking if each Aj is contained withinTrue, and if so appendingbj to True, until the ability of the algorithm to generate new True elements is exhausted. In particular, forward chaining does poorly on large sets, where we often only need to examine a fraction of the |Σ|
variables examined in the forward chaining procedure.
As an alternative to the forward chaining procedure, M. Wild [18] suggested an algorithm that considers the set difference Σ0 = Σ\ {Ak → bk : Ak 6⊆ True}.
For each (Aj →bj) ∈Σ0 it then addsbj to True and repeats as necessary. This algorithm retains the need for preprocessing in the form of ClauseLists. Though typically faster than forward chaining, Wild’s algorithm has a worst-case running time ofO(s(Σ)m2), which can cause problems for large values ofm.
Applying the folklore algorithm to processing ordered bases, theoretically, avoids the pitfalls of both forward chaining and Wild’s algorithm. It simply iterates from (A1 →b1) to (Am →bm) adding bi to True whenever Ai ⊆True. On one hand, its worst case processing time isO(s(Σ)) since we only need to iterate through the ordered basis once. At the same time, it takes a fraction of the time of the folklore approach on an non-ordered basis, which will require a minimum of two iterations in order to confirm that no new variables were added toTrue.
In testing the performance of these three algorithms, we generatedD-bases from the domains{1,2,3,4,5} through{1,2,3,4,5,6,7,8} and calculated the time nec- essary to derive the closure of some random subset of the set. For forward chaining and Wild’s algorithm, which require substantial preprocessing, we calculated the
Average Time (µs) Domain 5 Domain 6 Domain 7 Domain 8
Folklore 3.87 6.59 10.12 14.90
Forward Chaining (preprocessed) 7.74 11.32 15.99 21.58
Forward Chaining 30.34 46.10 66.74 91.48
Wild’s Algorithm (preprocessed) 10.14 14.51 19.66 26.16
Wild’s Algorithm 23.75 35.28 50.11 69.58
Table 1. Comparing Algorithm Processing Times
time with and without the preprocessing, which corresponds to the time required for computing the first closure on a basis versus the time for subsequent closures.
In our testing, the folklore algorithm considerably outperformed both forward chaining and Wild’s algorithm (without preprocessing) though its advantage fell, as the domain grew larger. For a domain of size 5, forward chaining and Wild’s al- gorithm took 2 and 2.66 times as long as folklore, respectively, which shrank to 1.45 and 1.76 as the domain grew to size 8. If we include preprocessing times, however, both algorithms continued to take over 4 times as long, with the relative time of forward chaining remaining relatively constant. Domains of sizes 5-8 corresponded to bases with an average of 8, 13, 19 and 27 implication, respectively.
Taking into an account that LINCLOSURE and Wild’s algorithm are normally performed on the agregated bases, we also ran a series of similar tests with the aggregated bases. Such a test is based on the fact that the D-basis is ordered direct in both the unit and the aggreagted form. The bases on domains of sizes 5-8 had an overage of 5,7,10 and 13 implications, respectively. The test showed even higher ratios of forward chaining (without preprocessing) times to the ordered direct processing times: from 3.04 for domain 5 to 2.1 for domain 8.
Noticeably, the ordered-basis approach does not actually require the represen- tation of propositions as S ={x1, . . . , xn} and implications as Σ = {s1, . . . , sm}, where each proposition has an associated integer value, necessary for indexing and traversingClauseList, Propositions, and Consequent. Though we can in principle take advantage of integer values in constructing our set of true values, we only require a set of satisfied propositions. By contrast, to use the forward chaining method on a basis without this representation would require significant overhead in hashing each proposition to its corresponding integer.
Additionally, the ordered-basis approach eliminates the need for pre-processing of the basis to store it in the form ofClauseList andConsequent. Since theD-basis is defined as the union of binary and non-binary sets of implications, which is reflected in the algorithm for producing it, we assume allD-bases are properly ordered. This is particularly important when the basis may not fit into main memory. Instead of having to individually access each ClauseListi when the propositional variable xi appears in True, the ordered-basis approach allows us to parse the basis in conveniently sized pieces.
There is at least one observation how the idea of the ordered basis may improve the performance of the forward chaining algorithm. Indexing the implications ac- cording to the proper order of the D-basis, whenever we add a variable toTrue, we may additionally maintain the index j of the implication from which it was derived. Then, when we process this variable, we only need to updatek-entries of Propositions where k≥j, saving us significant processing time for very large sets.
8. Building and optimizing the D-basis
We consider the D-basis a good alternative of any direct basis, since it has a smaller size than any direct basis and preserves the directness property, under a special ordering we define. In this section we consider an effective procedure to obtain theD-basis from any given direct basis, also to further optimize its binary part or to use the concept of the ordered sequence. The problem of obtaining the D-basis from othernon-direct bases is also tackled in [3].
As we saw in Lemma 9 and Example 10, theD-basis ΣD of any reduced closure system is a subset of the direct unit basis Σδ. The next statement shows that, given any direct unit basis, one can extract the D-basis from it in a polynomial time procedure.
Proposition 17. Let hS, φibe a reduced closure system. If the direct unit basisΣ for this system hasm implications, and|S|=n, then it requires timeO((nm)2)∼ O(s(Σ)2)to build theD-basisΣD equivalent toΣ.
Proof. Let hS, φi be the closure system on set S defined by Σ. By Lemma 9, ΣD⊆Σδ. According to Theorem 15 of [5], Σδcoincides with the canonical iteration- free basis introduced by M. Wild in [17]. Hence, by Corollary 17 of [5], Σδ is the smallest basis, with respect to containment, of all direct unit bases ofhS, φi.
Therefore, ΣD⊆Σδ ⊆Σ.
It follows that ΣD can simply be extracted from Σ by removing unnecessary implications. This amounts to finding the implicationsX →x, whereX will be a minimal join cover ofx, among the implications of Σ.
Note thatO(m) steps will be needed to separate binary implicationsy→xfrom X →x, where |X|>1. The number ofx∈S that appear in the consequence of implicationsX →xis at most the minimum ofmandn.
For every fixedx, it will take time O(m) to separate all implications X → x, and the number of such implications is at most m. If X1 → xand X2 →x are two implications in this set, we can decide in time O(mn) whether X1 X2 or x∈φ(y) for somey∈X2. If either holds,X2→xdoes not belong to theD-basis.
To check this, consider the closure systems Σi ⊆ Σ, i = 1,2 that consist of all binary implications of Σ, in addition toXi →x. Also, put an order on Σi, where all the binary implications precedeXi→x. Apparently,xis in the closure ofX2, in the closure system defined on S by Σ1, iff either X1 X2 or x6y for some y∈X2.
As pointed out in section 7, computation of the closure of any input set, either by the forward chaining algorithm, or by the ordered basis algorithm, is linear in the size of the input, which in this case is essentially the size of the binary part of Σ, orO(n2).
At the worst case, aboutO(m2) comparisons have to be made, for different covers X1, X2 of the same elementx, to determine the minimal ones. Hence, the overall
complexity isO(m2n2)∼O(s(Σ)2).
It follows from the procedure of Proposition 17 that the D-basis is obtained from any direct unit basis by removing implications X →x, for whichX is not a minimal cover ofxand|X|>1. In particular, the binary part of the direct basis, i.e., implications of the formy→x, remain in theD-basis.
We want to discuss a further optimization of the D-basis, as well as any other basis that has the same binary part as theD-basis. As was observed in section 2, for
a reduced closure system hS, φi, the elements ofS can be identified with elements of the closure lattice L, in such a way that J(L)⊆S ⊆L. This correspondence induces a natural order on S, with s 6 t if and only if φ(s) ⊆ φ(t). Thus, an implicationy→xbelongs to the D-basis iff x∈φ(y) iffx6y. The binary part of theD-basis then describes the partially ordered set (S,6).
Recall that, in the language of ordered sets, we say thaty covers xify > xand there is no elementz such thaty > z > x.
We can shorten the binary part of theD-basis, leaving only those implications y → xfor which y coversx in (S,6). This will come at the cost of the need to order the remaining implications. For example, if x→y, y →z, x→z are three implications from the binary part of some D-basis, then the last implication can be removed, under condition that the first two will be placed in that particular order into the ordered D-basis. More generally, suppose only covering pairs are to be included in the binary part of an ordered basis. Then the ordering of the implications should be such that, if x > y≥z > tin S with the strict inequalities being covers, thenx→y precedesz→t.
Recall also that if some set of implications Σ0is ordered, thenρΣ0(X), the ordered iteration of Σ0, is defined for everyX⊆S, see Definition 11.
Proposition 18. Let Σ1 be the binary part of the D-basis of a reduced closure system on a setS. If Σ1haskimplications and|S|=n, then there is anO(nk+n2) time algorithm that extractsΣ0⊆Σ1 describing the cover relation of join irreducible elements of closure system, and places the implications of Σ0 into a proper order.
Under this order,ρΣ0(y) =ρΣ1(y)for every y∈S.
Proof. We have the partially ordered set (S,6) of sizen, whose cover relation has at mostkpairs, thus, it will take timeO(nk+n2) to find the cover relation of this poset, see [10], also Theorem 11.3 in [9]. Let Σ0⊆Σ1be the set of all implications y → x, where y covers xin (S,6) . It remains to put these implications into a proper order. If (S,61) is any linear extension of (S,6), then one can take any order of Σ0 associated with this extension. Starting from the maximal element y of (S,61), write all implications y → x from Σ0, in any order, then pick next to maximal element z of (S,61) and write all implications z → t, in any order, then proceed with all elements of (S,61) in the same manner, in descending order
≥1. It remains to notice that there is an O(n+k) algorithm for producing the linear extension of partially ordered set withnelements andkpairs of comparable
elements, see Theorem 11.1 in [9].
Now we want to deviate slightly from the notion of ordered direct basis to the notion of ordered direct sequence of implications. Suppose Σ is some basis of a closure systemhS, φi. The ordered sequenceσ=hs1, . . . , stiof implications from Σ, not all necessarily different, is calledan ordered direct sequence from Σ, ifρσ(X) = φ(X) for everyX ⊆S.
The idea of ordered direct sequencing allows some further optimization of the D-basis. If Z =hz1, . . . , zki and T =ht1, . . . , tsi are two ordered sequences, then Z_T denotes their concatenation (the attachment ofT at the end ofZ).
Lemma 19. Supposeσ= Σ_1 Σ_2 Σ3 is an ordered direct sequence from some basis Σ, whereΣ1,Σ3consist of binary implications in proper order of Proposition 18,Σ2
consists of non-binary implications, andΣ2 can be put into arbitrary order without
changing the ordered direct status. If (A→y),(A →x)∈Σ2 and(y →x)∈Σ1, thenA→xcan be dropped fromΣ2 and replaced by an additionaly→xin Σ3. Proof. We need to show that wheneverY is an input set such thatx∈φ(Y), the replacement ofA→xbyy→xwill not affect computation ofρσ(Y).
Consider the case wheny6∈φ(Y). Then alsoA6⊆φ(Y), whence any implication with the premiseAwill never be applied in computation ofρσ(Y). The same is true for implications with premisey, so replacement ofA→xbyy→xcan trivially be done.
Now suppose thaty∈φ(Y). By assumption, we can takeA→xto be the last implication in the ordering of Σ2. So consider Yk, the result of ordered iteration of Σ_1 Σ2\(A→x) on the input set Y. Ify ∈Yk, then we can drop A→xfrom Σ2 and place y → x anywhere in proper order in Σ3, which will guarantee that x appears in ρ(Y). If y 6∈ Yk, then there is z ∈ Yk such that there exists some sequence in Σ3 from z to y. By assumption, Σ3 is in the proper order, hence any implication w →y precedes x→ t. Thus, we can place y →x in between those groups, following the proper order on all binary implications from Proposition 18.
After replacingA→xbyy→xin proper position of Σ3, we can still assume that the ordering of remaining part of Σ2can be arbitrary.
Corollary 20. SupposeΣD is theD-basis of some closure system. ConsiderΣ+D⊆ ΣD obtained from ΣD by performing the following reductions:
(a) RemoveA→x, ifA→y andy→xare also inΣD. (b) Removez→x, if z→y andy→xare also inΣD.
Let Σ1 be a the proper ordering of binary part of Σ+D given in Proposition 18, and let Σ3 be a subordering of this proper ordering on implications y →x that appear in triples of A → x, A → y, y → x of (a). Finally, let Σ2 be some ordering of non-binary implications ofΣ+D. Then σ= Σ_1 Σ_2 Σ3 is the ordered direct sequence for the basis Σ+D. In particular, the length of this sequence is no longer than the length of theD-basis.
Proof. Indeed, following the procedure of Lemma 19 we can replace allA→xfrom the triplesA→x, A→y, y→xin ΣD by the second copy ofy →xin additional binary part Σ3 that follows the non-binary part of theD-basis.
Example 21. Given the D-basis of the closure system: ΣD=h3→2,2→1,3→ 1,45 → 3,45 → 2,45 → 1i, we can produce a shorter basis Σ+D = {3 → 2,2 → 1,45→3} with the ordered direct sequence: σ=h3→2,2→1,45→3,3→2,2→ 1i. We note that Σ+D is only half as long as ΣD,, and its ordered direct sequence σ has the same length as D-basis with optimized binary part but the size of σ is smaller than that of the optimized D-basis.
9. Closure systems without D-cycles and theE-basis
It turns out that theD-basis can be further reduced, when an additional property holds in a closure system hS, φi. The results of this section follow closely the exposition given in [9], section 2.4.
We will writexDy, forx, y ∈S, if y ∈Y for some minimal cover Y of x. We note that theD-relation is a subset of the dependence relationδfrom section 5.
Definition 22. A sequence x1, x2, . . . , xn, where n > 1, is called a D-cycle, if x1Dx2D . . . xnDx1. A finite closure system hS, φiis said to be withoutD-cyclesif it has noD-cycles.
We note that the lattices of closed sets of closure systems withoutD-cycles are known in lattice-theoretical literature aslower bounded.
For every x∈S, letM(x) ={Y ⊆S :Y is a minimal cover ofx}. The family φ(M(x)) ={φ(Y) :Y ∈M(x)} is ordered by set containment, so we can consider its minimal elements. LetM∗(x) ={Y ∈M(x) :φ(Y) is minimal inφ(M(x))}.
We will writexEy, forx, y∈S, ify∈Y for someY ∈M∗(x). According to the definition, ifxEy thenxDy. On the other hand, the converse is not always true.
Example 23. Consider the closure system and itsD-basis from Example 16. We note that this closure system has noD-cycles. We have three minimal covers of6:
15,24 and23. Since φ(15) =S\2, φ(24) = S and φ(23) =S\45, we have only two of these covers in M∗(6): 15 and23. Thus, while6D4, we do not have 6E4.
We now define two sequences of subsets ofS, based on covers from M(x) and M∗(x), correspondingly.
LetD0 =E0 ={p∈S :p∈φ(p1, . . . , pk) impliesp∈φ(pi) for somei6k}. If Dk andEk are defined, thenDk+1=Dk∪ {s∈S: ifs / Y thens / Z for someZ⊆ Dk, Z Y andZ ∈ M(s)}. Similarly, Ek+1 =Ek ∪ {s ∈ S : ifs / Y then s / Z for some Z ⊆ Ek, Z Y andZ ∈M∗(s)}. Apparently, Ek ⊆Dk, for any k.
The following result is proved in [9], Theorem 2.51.
Lemma 24. IfhS, φiis a reduced closure system withoutD-cycles, then, for some k,S=Ek =Dk.
As a consequence, we can often shorten theD-basis for a closure system without D-cycles. We will say that s ∈ S has D-rank k = 0, if s ∈ D0, and k > 0, if s∈Dk\Dk−1. According to Lemma 24, everys∈S in a closure system without D-cycles has aD-rank.
Recall that a basis is called aggregated when all its premises are different. Ev- ery basis can be brought to the aggregated form by combining conclusions of all implications with the same premises.
Theorem 25. Let hS, φi be a reduced closure system without D-cycles. Consider a subsetΣE of theD-basis that is the union of two sets of implications:
(1) {y→x:x∈φ(y)}, (2) {X →x:X ∈M∗(x)}.
Then
(a) ΣE is a basis for hS, φi.
(b) ΣE is ordered direct.
(c) The aggregated form of ΣE is ordered direct.
Proof. To begin with, it is not true that every cover of an element x∈S refines to a cover inM∗(x), so ΣE must be ordered more carefully than ΣD. Nonetheless, mimicking the proof of Theorem 2.50 of [9], we can construct an order on ΣE that makes it an ordered direct basis. This will be done for the aggregated E-basis, proving parts (a) and (c) simultaneously; part (b) then follows.
Consider the aggregated form of ΣE. Given an implicationX →Y in this basis, letD∗(X →Y) be the maximalD-rank of elements inX, andD∗(X →Y) be the minimalD-rank of elements inY. ThenD∗(X →Y)< D∗(X →Y).
Order the implications following the rule: put the implications x → Y first (aggregated form of binary part of ΣE), and for the rest, if D∗(X1 → Y1) <
D∗(X2→Y2) thenX1→Y1 precedesX2→Y2in the order.
Claim. If X1→Y1 andX2→Y2 are in the aggregatedE-basis, andY1∩X26=∅, thenX1→Y1 precedes X2→Y2.
Indeed, take anyx∈Y1∩X2. If theD-rank ofxisk, thenD∗(X2→Y2)≥k≥ D∗(X1 →Y1) > D∗(X1 →Y1). Hence,X1 →Y1 will appear in the order before X2→Y2.
Now take any input set Z. We want to show that φ(Z) can be obtained when applying the aggregated basis in the described order. We argue by induction on the rank of an elementz∈φ(Z)\Z.
Ifz ∈D0, then it only can be obtained via some implicationx→Y, for some x∈Z, andz∈Y, and implicationsx→Y form an initial segment in the ordered sequence of the basis. Now assume that it is already proved that all elements of φ(Z)\Z of rank at mostkcan be obtained in some initial segment of the sequence for the basis. If we have now element z of rank k+ 1, then it can be obtained via an implication X →Y with X ⊆φ(Z), z ∈ Y, and D∗(X)< k+ 1. By the induction hypothesis, all elements inX⊆φ(Z)\Zcan be obtained via implications located in some initial segment of the sequence, and by the Claim above, all those implications precede X → Y. Thus, all implications producing elements of rank k+ 1 from φ(Z) will be located after the segment of the sequence producing all
rankkelements.
To illustrate the ordering of anE-basis, consider again the closure system given in Example 16. As we know from Example 23, ΣEexists and includes all implications of the D-basis, except 24 → 6. Elements 1,2,4 have D-rank 0; elements 3,5 have D-rank 1, andD-rank of 6 is 2. This allows to impose a proper ordering on implications of ΣE that turns it into ordered direct:
(1) 5→4, (2) 6→3, (3) 3→1, (4) 2→1, (5) 14→3, (6) 24→5, (7) 23→6, (8) 15→6. This basis is also aggregated.
Proposition 26. Suppose ΣD = {s1, s2, . . . , sn} is a D-basis of some reduced closure system hS, φi and|S|=m. It requires time O(mn2)to determine whether the closure system is withoutD-cycles, and if it is, to build its ordered direct basis ΣE.
Proof. Since theD-relation is a subset ofS2, it will contain at mostm2 pairs. On the other hand, it is built from implicationsX →x, so the other upper bound for pairs inD-relation ismn. Evidently, the closure system is withoutD-cycles iff its D-relation can be extended to a linear order. There exists an algorithm that can decide whetherhS, Dican be extended to a partial order onS in timeO(m+|D|), see Theorem 11.1 in [9]. We will see below that the rest of the algorithm will take timeO(mn2), which makes the total time also O(mn2).
Assuming the first part of algorithm provides a positive answer and there are no D-cycles, we proceed by finding the ranks of all elements. It will take at most n operations to find setD0: include pinto D0, if it does not appear as a conclusion in any (non-binary) implicationX →xof theD-basis, wherex / X. If the system is withoutD-cycles, thenπΣ(D0)\D0 gives elements of rank 1,π2Σ(D0)\πΣ(D0) elements of rank 2, etc. Note that πΣ(X) is defined in the beginning of section
6. Computation of πΣ(X) requires time O(mn), since Σ = ΣD in our case has n implications, and checking that the premise of each implication is a subset of X takes time O(m). After at most m iterations of π on D0, one would obtain the wholeS, whence,O(m2n) operations are needed to obtain the ranks of all elements from S. Assuming that m 6 n in most closure systems, this time will not beat O(mn2).
It remains to decide which implications from the D-basis should remain in the E-basis. To that end, for each element x ∈ S we need to compare the closures φ(X) of subsetsX, for whichX →xis in theD-basis, and choose for theE-basis those that are minimal. There is at mostnimplicationsX →x, for a givenx∈S, and the closure φ(X), for each such X, can be found in O(s(ΣD)) steps. It will take timeO(n2) to determine all minimal subsets amongO(n) given subsetsφ(X), associated with fixedx∈S. Hence, it will require timeO(mn2) for all x∈S.
The size of theE-basis will be at mostn, and it will take timeO(n2) to order it with respect to the rank of elements, per Theorem 25.
When a closure system hasD-cycles, the subset ΣE of ΣD, defined in Corollary 25, may not form a basis.
Example 27. Consider S ={1,2,3,4} and a closure operator defined by the D- basis
13→2,24→3,14→2,14→3.
This closure system has the cycle 2D3D2. It is easy to verify that ΣE has only 13 → 2 and 24 → 3, so the last two implications from the D-basis cannot be recovered fromΣE.
1 2 3 4
1
Figure 3. Example 27
Further results about closure systems without D-cycles, and more generally sys- tems whose closure lattice is join semidistributive, will be presented in [3].
10. D-basis versus Duquenne-Guigues canonical basis
We recall the definition of the canonical basis introduced by V. Duquenne and J.L. Guigues in [11], see also [4]. This applies to arbitrary closure systems, not just reduced ones.
Definition 28. The canonical basis of a closure system(S, φ)consists of implica- tionsX →Y forX, Y ⊆S, that satisfy the following properties:
(1) X ⊂φ(X) =Y;
(2) for anyφ-closed set Z, eitherX ⊆Z or Z∩X isφ-closed;
(3) ifW ⊆X,φ(W) =Y andW satisfies (2) in place ofX, thenW =X. The subsetsX⊂Swith properties (1) and (2) are usually calledquasi-closed, see [4]. The meaning of (2) is that addingX to the family of closed sets ofφproduces the family of closed sets of another closure operator. Property (3) indicates that among all quasi-closed subsets with the same closure one needs to choose the min- imal ones. This basis is calledcanonical, since it is minimal, in that no implication can be removed from it without alteringφ, and every other minimal implicational basis forφcan be obtained from it. In particular, no other basis can have a smaller number of implications. Note that here the implications are of the form X →Y, whereY is not necessarily a one-element set. We will also call it theD-G basis, to distinguish from canonical unit direct basis.
To bring this basis in comparison with other bases discussed in this paper, each implicationX →Y may be replaced by set of implicationsX →y,y∈Y \X. We will call this modification of the canonical basis theunit D-G basis.
In many cases the canonical basis may be turned into an ordered direct basis.
Example 29. Consider again the closure system from Example 16. The canonical basis is
2→1,3→1,5→4,6→3,6→1,14→3,123→6,1345→6,12346→5. Besides, it is ordered direct in the given order.
In general, though, the canonical basis cannot be ordered so that it becomes direct. Thus, it is not ordered direct. The following two examples were uncovered by running a computer program and checking about a million of various closure systems on 5- and 6-element sets. The first example demonstrates a closure system, where the canonical basis cannot be ordered, while the unit expansion of this basis does admit an ordering to make it direct. The second example shows that some canonical bases cannot be ordered in either form.
Example 30.
LethS, φibe a closure system onS={1,2,3,4,5,6}, given by the family of closed sets: {∅,1,2,3,4,6,36,26,13,24,14,35,23,16,135,136,236,1246,2345, S}. The lat- tice representation of this system is given in Figure 4.
Then the canonical basis is 5→3,34→25,12→46,46→12,235→4,356→ 124. It is easy to show that this basis cannot be ordered. Indeed, in order to obtain φ(145) =Sin one application of canonical basis, one would need to put 5→3 first, then 34→25, followed by 12→46. On the other hand,φ(123) =S, too, and the only implication applicable to 123 is 12→46, but it comes after 34→25, and one cannot obtain 5 in the closure otherwise.
As was mentioned, the unit expansion of this canonical basis is still ordered direct: one would need to place implications 12→4 and 12 →6 around 34→ 2 and 34→5, thusly: 5→3,12→4,34→2,34→5,12→6,46→2,46→1,235→ 4,356→1,356→2,356→4.
1 2
3 4
5
6
Figure 1. Example 66
1
Figure 4. Example 30
As always, theD-basis is ordered direct in both forms: in its original unit form, and in the aggregated form. For example, the aggregated form of D-basis in this example is 5→3,34→25,12→46,46→12,25→4,56→124,123→5,134→6.
One needs to run the canonical basis two times to ensure the closure of arbitrary subset, i.e., apply 6·2 = 12 implications, versus only 8 implications of the aggregated D-basis. In the unit form, the canonical basis has 11 implications and theD-basis has 13, but the ordering of the canonical basis requires special care.
Remark 31.
Example 30 also shows that the D-basis, unlike the canonical basis, can be redundant (even in its aggregated form): this means that some implications can be removed, and the remaining ones still define the same closure system. In the D-basis of our example, both implications 123→5,134→6 can be removed, since they follow from 34 → 25,12 → 46. On the other hand, the basis without these two implications is no longer ordered direct.
The following example shows that the canonical basis might be un-orderable in either form.
Example 32.
Let hS, φi be a closure system on S = {1,2,3,4,5,6}, given by the family of closed sets: {∅,1,2,3,5,6,12,13,14,16,23,123,124,135,256,1346, S}. The lattice representation of this system is given in Figure 5.
The canonical basis has 9 implications:
4→1,15→3,35→1,25→6,56→2,26→5,36→14,134→6,146→3.
There is a single implication 36→14 that can be expanded to two unit implications 36→1 and 36→4.
The proof that the unit expansion of canonical basis cannot be ordered to make it direct, follows from consideration of the next three closures:
• 45 →145 → 1345 →13456 → S, hence, 134→ 6 should be placed later than 15→3.
• 1234→12346→S, hence 26→5 should be placed later than 134→6.
1 2 3 5 4
6
Figure 1. Example 67
1
Figure 5. Example 32
• 126 → 1256 → 12356 → S, hence 15 → 3 should be placed later than 26→5, which contradicts the combination of the previous two items.
For comparison, the aggregatedD-basis has 15 implications:
4 →1,45→26,36→ 14,34→6,15→3,46→3,35→1,25→6,26→5,56→ 2,126→34,235→4,156→4,234→5,125→4.
Thus, one run of the aggregatedD-basis (15 implications) wins over two runs (18 implications) of the canonical basis. In unit expansions: D-basis (18 implications) still wins over two runs (20) of canonical basis.
In this example, the D-basis is 4 implications shorter than the canonical unit direct basis, which has 22 implications.
Our earlier analysis of the binomial part of theD-basis in Proposition 18 carries over to a partial optimization of the canonical basis.
Proposition 33. The binary part of the unit expansion of the D-G canonical basis of any reduced closure system coincides with the binary part of the D-basis (or, E-basis, if it exists) of the same system.
Proof. We recall that the binary part of theD-basis of closure systemhS, φiconsists of implicationsy →x, wherex∈φ(y)\y. This implies that{y}is not aφ-closed set. Besides, it is a quasi-closed set, since the intersection of{y}with anyφ-closed set is either {y} or ∅. Evidently, {y} will be the minimum quasi-closed set with the closureφ(y). Hence,{y} →φ(y)\y should be an implication in the canonical basis. Evidently, the unit expansion of {y} → φ(y)\y gives all the implications in theD-basis with the premise y. Vice versa, every implication in the canonical basis of the formy→Y implies thatY =φ(y)\y. Hence,y→y0 fory0∈Y should
appear in theD-basis.
The following statement is an immediate consequence of Proposition 18 and Proposition 33. We recall that Lstands for the lattice of closed sets ofhS, φi, and (J(L),6) is a partially ordered set of join-irreducible elements ofL.
Corollary 34. Let ΣC be the canonical basis ofhS, φi, where |S|=m. Let ΣbC⊆ ΣC be a binary part of ΣC, and let n be the number of implications in the unit