Frequent patterns, which are sequences or item sets that occur frequently in a data set, form the basis for data mining as they can be used to find associations such as products which sell well together and so on.

**Transactional database**

Starting out with a set of items, I = {apple, orange, soap, candy, scotch tape}, each transaction would then refer to a subset of I. The collection of all these transactions is called a transactional database, or TDB.

T1 – {apple, orange}

T2 – {orange, soap}

T3 – {orange, soap, candy, scotch tape}

In order to find patterns, we first have to define a support level, which is the number of occurrences of each frequent itemset in the TDB. With a support level of 2, we would have the total following frequent itemsets:

{orange}, {soap}, {orange, soap}

We can also say that the relative support of {orange} is around 67%, as it occurs in ~67% of all transactions.

In general we want to reduce the number of patterns due to storage space, furthermore enumerating all the frequent itemsets in a TDB is often computationally expensive.

**Equivalence classes**

An equivalence class is a set of itemsets which always occur in the same set of transactions.

**Types of patterns**

Maximal pattern – For a given itemset x in a set A of itemsets, x is maximal in A if and only if there is no other itemset in A that is a superset of x.

Minimal pattern – For a given itemset x in a set A of itemsets, x is minimal in A if and only if there is no other itemset in A that is a subset of x.

Closed pattern – The maximal pattern of an equivalence class

Generator – The minimal pattern of an equivalence class

**Down closure property**

If X is a frequent itemset in a set A of itemsets, then all subsets of X are also frequent itemsets of A.

If X is an infrequent itemset in a set A of itemsets, then all supersets of X are also infrequent itemsets of A.

**Convexity**

For any pattern space (any set of patterns) A, if every pattern which is a superset of the minimal patterns in A and a subset of the maximal patterns in A is present in A, then A is convex.

Representations of datasets

We can reduce the size of a dataset (compression) by using various representations, which refer to selecting representative itemsets to create a new dataset. Representations may have two properties:

Lossless

- Every pattern in the dataset and it’s support can be derived from the representation without access to the original dataset.

Concise

- The size of the representation is smaller than the original dataset.

Selecting the maximal patterns in a dataset is concise but lossy.

Selecting the closed patterns in a dataset is concise and lossless.