References
http://www.catonmat.net/blog/set-operations-in-unix-shell/
http://en.wikipedia.org/wiki/Set_theory
http://en.wikipedia.org/wiki/Association_rule_learning
http://en.wikipedia.org/wiki/Precision_and_recall
http://michael.hahsler.net/research/association_rules/measures.html#lift
Get the Code!
git clone https://github.com/bmtober/sets
sets
|-- docs
| \`-- images
| \`-- pic
|-- src
\`-- test
|-- data
\`-- expected
Install to /usr/local/bin
cd sets
sudo make install
Install to specified location
cd sets
PREFIX=~/.local/bin make install
Review of Set Theory
What is a Set
A collection of arbitrary objects:
$ A = {text{milk}, text{juice}, text{bread}, text{jam}, text{eggs}} $
$ B = {text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
Set Membership (Contains)
$ A = {text{milk}, text{juice}, text{bread}, text{jam}, text{eggs}} $
$ B = {text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
$ text{milk} in A, text{milk} notin B $
Cardinality
$ A = {text{milk}, text{juice}, text{bread}, text{jam}, text{eggs}} $
$ B = {text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
$ |A| = 5 $
$ |B| = 6 $
$ |X| = sum_i delta(x_i in X) $
Empty Set
$ {} $
$ forall x, x notin {}$
$ |{}| = 0 $
Union
$ A = {text{milk}, text{juice}, text{bread}, text{jam}, text{eggs}} $
$ B = {text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
$ A uu B = {x | x in A or x in B }
= {text{bacon}, text{bread}, text{butter}, text{eggs}, text{ham}, text{jam}, text{juice}, text{milk}, text{sausage}} $
Intersection
$ A = {text{milk}, text{juice}, text{bread}, text{jam}, text{eggs}} $
$ B = {text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
$ A nn B = {x | x in A and x in B }
= {text{bread}, text{eggs}} $
Disjoint
$ A = {text{milk}, text{juice}, text{bread}, text{jam}, text{eggs}} $
$ D = {text{ham}, text{steak}, text{chicken}, text{fish}} $
$ A nn D = {} $
Relative Complement
$ A = {text{milk}, text{juice}, text{bread}, text{jam}, text{eggs}} $
$ B = {text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
$ A - B = {text{jam}, text{juice}, text{milk}} $
Relative Complement
$ A = {text{milk}, text{juice}, text{bread}, text{jam}, text{eggs}} $
$ B = {text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
$ B - A = {text{bacon}, text{butter}, text{ham}, text{sausage}} != A - B $
Symmetric Difference
$ A = {text{milk}, text{juice}, text{bread}, text{jam}, text{eggs}} $
$ B = {text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
$ A Delta B = (A uu B) - (A nn B) =
{text{bacon}, text{butter}, text{ham}, text{jam}, text{juice}, text{milk}, text{sausage}} $
Subset (Includes)
$ F sube B iff x in F => x in B $
$ F - B = {} $
$ {text{butter}, text{bacon}} sube
{text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
Set Equality
$ X = Y iff X sube Y and Y sube X $
That is, $X$ and $Y$ have exactly the same membership, and
consequently, $ |X| = |Y|$.
Proper Subset
$ F sub B iff F sube B and F != B $
Note
$ |F| < |B| $
$ B = {text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
$ F = {text{butter}, text{bacon}} $
$ F sub B $ since all members of $F$ are members of $B$ but $F != B$.
Cross Product
$ D = {text{ham}, text{steak}, text{chicken}, text{fish}} $
$ F = {
text{butter}, text{bacon},
} $
$ F xx D =
{
text{(bacon, chicken)},
text{(bacon, fish)},
text{(bacon, ham)},
text{(bacon, steak)},
$
$
text{(butter, chicken)},
text{(butter, fish)},
text{(butter, ham)},
text{(butter, steak)}
}$
$xx$ | chicken | fish | ham | steak |
---|---|---|---|---|
bacon |
(bacon, chicken) |
(bacon, fish) |
(bacon, ham) |
(bacon, steak) |
butter |
(butter, chicken) |
(butter, fish) |
(butter, ham) |
(butter, steak) |
Power Set
$ D = {text{ham}, text{steak}, text{chicken}, text{fish}} $
$ cc"P"(D) = {
text{{ham}},
text{{steak ham}},
text{{steak}},
text{{chicken ham}},
text{{chicken steak ham}},
text{{chicken steak}},
text{{chicken}}, $
$
text{ }
text{{fish ham}},
text{{fish steak ham}},
text{{fish steak}},
text{{fish chicken ham}}, $
$
text{ }
text{{fish chicken steak ham}},
text{{fish chicken steak}},
text{{fish chicken}},
text{{fish}},
text{{}}
} $
Applications of Set Theory
-
Item Sets
-
Association Rules
-
Metrics
Item Sets
$ I = {i_1, i_2, ... i_m} $ is the set of all items
$ = {text{bacon}, text{bread}, text{butter}, text{chicken}, text{eggs}, text{fish},
text{ham}, text{jam}, text{juice}, text{milk}, text{sausage}, text{steak}}$
$ T = {t_1, t_2, ... t_n} $ is a set of transactions where each $t_i$
is a subset of $I$ ... an itemset like we saw earlier:
$ t_1 = A = {text{milk}, text{juice}, text{bread}, text{jam}, text{eggs}} $
$ t_2 = B = {text{butter}, text{bacon}, text{sausage}, text{ham}, text{eggs}, text{bread}} $
$ t_3 = C = {} $
$ t_4 = D = {text{ham}, text{steak}, text{chicken}, text{fish}} $
$ t_5 = E = {text{milk}, text{bread}, text{eggs}} $
$ t_6 = F = {text{butter}, text{bacon}} $
$ t_7 = G = {text{milk}, text{bacon}, text{bread}, text{jam}, text{eggs}} $
$ t_8 = H = {text{bacon}, text{bread}, text{butter}, text{eggs}, text{jam}} $
$ t_9 = I = {text{bacon}, text{bread}, text{ham}, text{juice}, text{milk}, text{sausage}} $
$ t_10 = J = {text{bacon}, text{bread}, text{eggs}, text{milk}, text{sausage}} $
$ |T| = 10 $ for this example.
Association Rules
$ X => Y $, i.e., observation of $X$ implies probable presence of $Y$
$X nn Y = {}$, i.e., disjoint
$ {text{milk}, text{bread}} => {text{jam}} $
Rule Evaluation Metrics
-
Support
-
Confidence
-
Lift
-
Conviction
Support
$ text{supp}(X) = (sum_(i=1)^n delta((X) sube t_i))/|T| $
$ X = {text{milk}, text{bread}} $
$ X sube A, E, G, I, J $
$ text{supp}(X) = 5/10 $
Support for an Association Rule
$ text{supp}(X => Y) = (sum_(i=1)^n delta((X uu Y) sube t_i))/|T| $
$ X = {text{milk}, text{bread}} $
$ Y = {text{jam}} $
$ X uu Y sube A, G $
$ text{supp}(X => Y) = 1/5 $
Confidence
$ text{conf}(X => Y) = (text{supp}(X uu Y))/(text{supp}(X))$
$ X = {text{milk}, text{bread}} $
$ Y = {text{jam}} $
$ X sube A, E, G, I, J $
$ X uu Y sube A, G $
$ text{conf}(X => Y) = (2//10)/(5//10) = 2/5 $
Lift
$ text{lift}(X => Y) = (text{supp}(X uu Y))/(text{supp}(X)xxtext{supp}(Y)) = (text{conf}(X uu Y))/(text{supp}(Y))$
$ X = {text{milk}, text{bread}} $
$ Y = {text{jam}} $
$ X uu Y sube A, G $
$ X sube A, E, G, I, J $
$ Y sube A, G, H $
$ text{lift}(X => Y) = (2/10)/((5/10)*(3/10)) = 4/3 $
Conviction
$ text{conv}(X => Y) = (1 - text{supp}(Y))/(1 - text{conf}(X => Y)) = 1/(text{lift}(X => bar Y))$
$ X = {text{milk}, text{bread}} $
$ Y = {text{jam}} $
$ Y sube A, G, H $
$ X uu Y sube A, G $
$ X sube A, E, G, I, J $
$ text{conv}(X => Y) = (1 - 3/10)/(1 - 2/5) = (7//10)/(3//5) = 7/6 $
Other Measures
-
Precision
-
Recall
-
Accuracy
Confusion Matrix
Predicted Condition |
|||
False |
True |
||
Actual Condition |
False |
$m_(00)$ |
$m_(01)$ |
True |
$m_(10)$ |
$m_(11)$ |
$m_(ij)$ is the number of results actually in class $i$ and identified as class $j$
$i=j$ implies correct classification
$i!=j$ implies incorrect classification
$m_(00)$ = True Negative
$m_(10)$ = False Negative
$m_(11)$ = True Positive
$m_(01)$ = False Positive
Confusion Matrix
$ text{bread, milk} notin t_i $ |
$ text{bread, milk} in t_i $ |
||
$ text{jam} notin t_i $ |
$B,C,D,F$ |
$E,I,J$ |
|
$ text{jam}\ in t_i $ |
$H$ |
$ A,G $ |
Precision
How useful the search results are
$ text{Precision} = (m_(11))/(m_(11) + m_(01)) = 2/(2+3) = 2/5 $
Recall
How complete the results are
$text{Recall} = (m_(11))/(m_(11) + m_(10)) = 2/(2+1) = 2/3 $
Accuracy
Total number of correct predictions as a fraction of total decisions
$ A = (m_(00) + m_(11))/(m_(00) + m_(01) + m_(10) + m_(11)) = (4+2)/(4+3+2+1) = 3/5 $