Lecture 3: Association rules and frequent sequence mining

Quan Nguyen, Department of Statistics, University of British Columbia

1. Introduction to association rules mining

Association rules look for combination of items that frequently occur together in transactions (a.k.a. frequent set items). For example, retail basket analysis is one of the most common applications of association rules. By finding frequent itemsets, retailers could understand what is commonly bought together and use this information to increase sales in various ways (e.g., place frequently bought together items next to each other in a store).

Transaction_ID

items

1

milk, bread

2

bread, butter

3

beer, avocado, pizza

4

milk, bread, butter

5

bread, butter

What is an example of a frequent itemset from the table above?

How can association rules be applied in education?

If we take the analogy of a market basket analysis, each student is a ‘customer’ and in each semester, they will make a ‘transaction’ by enrolling in a set of courses. The data will often take the forms of term_ID,student_ID, and course_ID.

term_ID

student_ID

course_ID

1

Quan

STAT_101

1

Quan

PSY_101

1

Quan

ECON_101

2

Quan

STAT_201

2

Quan

CS_101

2

Quan

POL_101

1

Chris

CS_101

1

Chris

MATH_101

1

Chris

PSY_101

2

Chris

CS_201

2

Chris

STAT_101

Similar data but in a different format

term_ID

student_ID

transaction_ID

course_ID_set

1

Quan

1

STAT_101, PSY_101, ECON_101

2

Quan

2

STAT_201, CS_101, POL_101

1

Chris

3

CS_101, MATH_101, PSY_101

2

Chris

4

CS_201, STAT_101

Using the information available in the Student Information Systems (SIS), we can answer research questions such as:

  • Which set of courses are often taken together? -> A-priori algorithm

  • Which set of courses are often taken following a set of courses? -> SPADE algorithm

What are the limitations of association rules?

Note that association rules only focus on the most frequent (popular) items, and it misses out all the information that are present in the “long tail” of user preference that are essential to provide customized recommendations to customers/users. In such scenarioes, other recommender systems such as collaborative filtering (i.e. recommendations based on user similarity) or content-based filtering (i.e., recommendations based on item similarity) will help detect customers with similar interests even if the absolute number of transactions is small (e.g., customer who are interested in a niche category).

library(tidyverse)
library(arules)
library(arulesViz)
library(arulesSequences)
── Attaching packages ──────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──

 ggplot2 3.3.5      purrr   0.3.4
 tibble  3.1.5      dplyr   1.0.7
 tidyr   1.1.3      stringr 1.4.0
 readr   2.0.1      forcats 0.5.1

Warning message:
“package ‘tibble’ was built under R version 4.1.1”
Warning message:
“package ‘readr’ was built under R version 4.1.1”
── Conflicts ─────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()

Warning message:
“package ‘arules’ was built under R version 4.1.1”
Loading required package: Matrix


Attaching package: ‘Matrix’


The following objects are masked from ‘package:tidyr’:

    expand, pack, unpack



Attaching package: ‘arules’


The following object is masked from ‘package:dplyr’:

    recode


The following objects are masked from ‘package:base’:

    abbreviate, write


Warning message:
“package ‘arulesViz’ was built under R version 4.1.1”

Attaching package: ‘arulesSequences’


The following object is masked from ‘package:arules’:

    itemsets

Let’s create a sythetic dataset

df <- data.frame(matrix(ncol = 2, nrow = 1000))
colnames(df) <- c("id","course")

set.seed(123)
df$id <- sample(x=1:250,1000,replace = T)

set.seed(123)
df$course <- sample(x=c("PSY","MATH","STATS","ECON","PHY","ENG","POLS","BIO","CHEM"),1000,replace=T)

df <- df %>% distinct() %>% arrange(id)
head(df)
A data.frame: 6 × 2
idcourse
<int><chr>
11POLS
22ECON
32MATH
42STATS
53ECON
64BIO

Let’s transform this data frame into transaction format

df <- df %>% group_by(id) %>% 
  summarise(courselist = paste0(course, collapse = ","),count = n())
head(df)
A tibble: 6 × 3
idcourselistcount
<int><chr><int>
1POLS 1
2ECON,MATH,STATS 3
3ECON 1
4BIO,PHY,CHEM,ENG4
5PSY,BIO 2
6ENG,BIO 2
# convert to transaction data object
df$courselist <- as.factor(df$courselist)

# export to csv
write.csv(df$courselist,"course.csv", row.names = FALSE, col.names = FALSE, quote = FALSE)

# read in the csv file
df_obj <- read.transactions('course.csv', format = 'basket', sep=',')
df_obj
Warning message in write.csv(df$courselist, "course.csv", row.names = FALSE, col.names = FALSE, :
“attempt to set 'col.names' ignored”
transactions in sparse format with
 247 transactions (rows) and
 10 items (columns)

2. Association measures

Support

The support metric tells us how popular a set of items is relative to the total number of transactions

itemFrequencyPlot(df_obj,topN=20,type="relative", main="Relative Item Frequency Plot")
../_images/Lecture3_12_0.png

Confidence

The confidence metric tells us how likely an item Y is purchased given item X is purchased.

confidence = P(X,Y) / P(X)

Note that the confidence measure might misrepresent the importance of an association. This is because it only accounts for how popular item X is (in our case PSY) but not Y (in our case POLS).

If beer is also very popular in general, there will be a higher chance that a transaction containing PSY will also contain POLS, thus inflating the confidence measure. To account for the base popularity of both items, we use a third measure called lift.

Lift

Lift tells us how likely item Y is purchased when item X is purchased, while controlling for how popular items Y and X are

  • lift = 1: implies no association between items.

  • lift > 1: greater than 1 means that item Y is likely to be bought if item X is bought,

  • lift < 1: less than 1 means that item Y is unlikely to be bought if item X is bought.

3. A-priori algorithm with arules

association.rules <- apriori(df_obj, parameter = list(supp=0.02, conf=0.6,maxlen=6))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.6    0.1    1 none FALSE            TRUE       5    0.02      1
 maxlen target  ext
      6  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 4 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [19 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
inspect(association.rules[1:10])
     lhs                   rhs     support    confidence coverage   lift    
[1]  {MATH, PHY}        => {CHEM}  0.06882591 0.6800000  0.10121457 1.646667
[2]  {ECON, ENG, MATH}  => {STATS} 0.02429150 0.6000000  0.04048583 1.610870
[3]  {ENG, MATH, STATS} => {ECON}  0.02429150 0.6000000  0.04048583 1.807317
[4]  {ENG, MATH, PHY}   => {CHEM}  0.02429150 0.7500000  0.03238866 1.816176
[5]  {ECON, MATH, PHY}  => {CHEM}  0.02024291 0.7142857  0.02834008 1.729692
[6]  {BIO, ECON, MATH}  => {PSY}   0.02024291 0.6250000  0.03238866 1.882622
[7]  {MATH, PSY, STATS} => {ECON}  0.02024291 0.6250000  0.03238866 1.882622
[8]  {MATH, PHY, PSY}   => {CHEM}  0.02429150 0.7500000  0.03238866 1.816176
[9]  {CHEM, MATH, PSY}  => {PHY}   0.02429150 0.6666667  0.03643725 1.937255
[10] {MATH, PHY, STATS} => {CHEM}  0.02024291 0.7142857  0.02834008 1.729692
     count
[1]  17   
[2]   6   
[3]   6   
[4]   6   
[5]   5   
[6]   5   
[7]   5   
[8]   6   
[9]   6   
[10]  5   
# Finding Rules related to given items

# Find what students enrolled before choosing 'ECON'
item.association.rules <- apriori(df_obj, parameter = list(supp=0.02, conf=0.6),
                                  appearance = list(default="lhs",rhs="ECON"))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.6    0.1    1 none FALSE            TRUE       5    0.02      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 4 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [2 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
inspect(head(item.association.rules))
    lhs                   rhs    support    confidence coverage   lift    
[1] {ENG, MATH, STATS} => {ECON} 0.02429150 0.600      0.04048583 1.807317
[2] {MATH, PSY, STATS} => {ECON} 0.02024291 0.625      0.03238866 1.882622
    count
[1] 6    
[2] 5    
# Find what students enrolled after choosing 'MATH' and 'PHY'
item2.association.rules <- apriori(df_obj, parameter = list(supp=0.02, conf=0.6),
                                   appearance = list(default="rhs",lhs=c("MATH","PHY")))

inspect(head(item2.association.rules))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.6    0.1    1 none FALSE            TRUE       5    0.02      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 4 

set item appearances ...[2 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [1 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
    lhs            rhs    support    confidence coverage  lift     count
[1] {MATH, PHY} => {CHEM} 0.06882591 0.68       0.1012146 1.646667 17   
plot(association.rules)
# plot(association.rules, engine = "plotly")
To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
../_images/Lecture3_21_1.png

There is a special value for shading called “order” which produces a two-key plot where the color of the points represents the length (order)

plot(association.rules,method="two-key plot")
To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
../_images/Lecture3_23_1.png

Graph-based techniques concentrate on the relationship between individual items in the rule set. They represent the rules (or itemsets) as a graph with items as labeled vertices, and rules (or itemsets) represented as vertices connected to items using arrows.

For rules, the LHS items are connected with arrows pointing to the vertex representing the rule and the RHS has an arrow pointing to the item.

plot(association.rules, method = "graph")
# plot(association.rules, method = "graph",  engine = "htmlwidget")
../_images/Lecture3_25_0.png
# Parallel plot
plot(association.rules, method="paracoord")
../_images/Lecture3_26_0.png

Let’s try a-priori with different levels of support

# Support and confidence values
supportLevels <- c(0.1, 0.05, 0.01, 0.005)
confidenceLevels <- c(0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1)

# Empty integers 
rules_sup10 <- integer(length=9)
rules_sup5 <- integer(length=9)
rules_sup1 <- integer(length=9)
rules_sup0.5 <- integer(length=9)

# Apriori algorithm with a support level of 10%
for (i in 1:length(confidenceLevels)) {
  
  rules_sup10[i] <- length(apriori(df_obj, parameter=list(sup=supportLevels[1], 
                                   conf=confidenceLevels[i], target="rules")))
  
}

# Apriori algorithm with a support level of 5%
for (i in 1:length(confidenceLevels)){
  
  rules_sup5[i] <- length(apriori(df_obj, parameter=list(sup=supportLevels[2], 
                                  conf=confidenceLevels[i], target="rules")))
  
}

# Apriori algorithm with a support level of 1%
for (i in 1:length(confidenceLevels)){
  
  rules_sup1[i] <- length(apriori(df_obj, parameter=list(sup=supportLevels[3], 
                                  conf=confidenceLevels[i], target="rules")))
  
}

# Apriori algorithm with a support level of 0.5%
for (i in 1:length(confidenceLevels)){
  
  rules_sup0.5[i] <- length(apriori(df_obj, parameter=list(sup=supportLevels[4], 
                                    conf=confidenceLevels[i], target="rules")))
  
}
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.9    0.1    1 none FALSE            TRUE       5     0.1      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 24 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5     0.1      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 24 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.7    0.1    1 none FALSE            TRUE       5     0.1      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 24 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.6    0.1    1 none FALSE            TRUE       5     0.1      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 24 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.1      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 24 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.4    0.1    1 none FALSE            TRUE       5     0.1      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 24 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [12 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.3    0.1    1 none FALSE            TRUE       5     0.1      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 24 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [74 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.2    0.1    1 none FALSE            TRUE       5     0.1      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 24 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [81 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.1    0.1    1 none FALSE            TRUE       5     0.1      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 24 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [81 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.9    0.1    1 none FALSE            TRUE       5    0.05      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 12 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5    0.05      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 12 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.7    0.1    1 none FALSE            TRUE       5    0.05      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 12 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.6    0.1    1 none FALSE            TRUE       5    0.05      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 12 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [1 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5    0.05      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 12 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [7 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.4    0.1    1 none FALSE            TRUE       5    0.05      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 12 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [46 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.3    0.1    1 none FALSE            TRUE       5    0.05      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 12 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [128 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.2    0.1    1 none FALSE            TRUE       5    0.05      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 12 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [135 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.1    0.1    1 none FALSE            TRUE       5    0.05      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 12 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [135 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.9    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [5 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [7 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.7    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [26 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.6    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [56 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [116 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.4    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [279 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.3    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [600 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.2    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [759 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.1    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [767 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.9    0.1    1 none FALSE            TRUE       5   0.005      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 1 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [22 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5   0.005      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 1 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [24 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.7    0.1    1 none FALSE            TRUE       5   0.005      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 1 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [43 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.6    0.1    1 none FALSE            TRUE       5   0.005      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 1 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [115 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5   0.005      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 1 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [215 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.4    0.1    1 none FALSE            TRUE       5   0.005      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 1 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [404 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.3    0.1    1 none FALSE            TRUE       5   0.005      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 1 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [748 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.2    0.1    1 none FALSE            TRUE       5   0.005      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 1 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [968 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.1    0.1    1 none FALSE            TRUE       5   0.005      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 1 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 247 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [999 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
# Number of rules found with a support level of 10%
plot1 <- qplot(confidenceLevels, rules_sup10, geom=c("point", "line"), 
               xlab="Confidence level", ylab="Number of rules found", 
               main="Apriori with a support level of 10%") +
  theme_bw()

# Number of rules found with a support level of 5%
plot2 <- qplot(confidenceLevels, rules_sup5, geom=c("point", "line"), 
               xlab="Confidence level", ylab="Number of rules found", 
               main="Apriori with a support level of 5%") + 
  scale_y_continuous(breaks=seq(0, 10, 2)) +
  theme_bw()

# Number of rules found with a support level of 1%
plot3 <- qplot(confidenceLevels, rules_sup1, geom=c("point", "line"), 
               xlab="Confidence level", ylab="Number of rules found", 
               main="Apriori with a support level of 1%") + 
  scale_y_continuous(breaks=seq(0, 50, 10)) +
  theme_bw()

# Number of rules found with a support level of 0.5%
plot4 <- qplot(confidenceLevels, rules_sup0.5, geom=c("point", "line"), 
               xlab="Confidence level", ylab="Number of rules found", 
               main="Apriori with a support level of 0.5%") + 
  scale_y_continuous(breaks=seq(0, 130, 20)) +
  theme_bw()

# Subplot
library(gridExtra)
library(grid)
grid.arrange(plot1, plot2, plot3, plot4, ncol=2)
Warning message:
“package ‘gridExtra’ was built under R version 4.1.1”

Attaching package: ‘gridExtra’


The following object is masked from ‘package:dplyr’:

    combine
../_images/Lecture3_29_1.png
# Data frame
num_rules <- data.frame(rules_sup10, rules_sup5, rules_sup1, rules_sup0.5, confidenceLevels)

# Number of rules found with a support level of 10%, 5%, 1% and 0.5%
ggplot(data=num_rules, aes(x=confidenceLevels)) +
  
  # Plot line and points (support level of 10%)
  geom_line(aes(y=rules_sup10, colour="Support level of 10%")) + 
  geom_point(aes(y=rules_sup10, colour="Support level of 10%")) +
  
  # Plot line and points (support level of 5%)
  geom_line(aes(y=rules_sup5, colour="Support level of 5%")) +
  geom_point(aes(y=rules_sup5, colour="Support level of 5%")) +
  
  # Plot line and points (support level of 1%)
  geom_line(aes(y=rules_sup1, colour="Support level of 1%")) + 
  geom_point(aes(y=rules_sup1, colour="Support level of 1%")) +
  
  # Plot line and points (support level of 0.5%)
  geom_line(aes(y=rules_sup0.5, colour="Support level of 0.5%")) +
  geom_point(aes(y=rules_sup0.5, colour="Support level of 0.5%")) +
  
  # Labs and theme
  labs(x="Confidence levels", y="Number of rules found", 
       title="Apriori algorithm with different support levels") +
  theme_bw() +
  theme(legend.title=element_blank())
../_images/Lecture3_30_0.png

4. cSPADE algorithm with arulesSequences

Frequent Sequence Mining is used to discover a set of patterns shared among objects which have between them a specific order. For instance, students could enroll in multiple courses over different semester. In this case, we may use Frequent Sequence Mining to find that 40% of the students who enrolled in the STAT101 in term 1, continued to enroll in STAT 201 in term 2.

# create data semester 1
dt1 <- data.frame(matrix(ncol = 2, nrow = 1000))
colnames(dt1) <- c("sequenceID","item")

set.seed(123)
dt1$sequenceID <- sample(x=1:250,1000,replace = T)

set.seed(123)
dt1$item <- sample(x=c("PSY","MATH","STATS","ECON"),1000,replace=T)

dt1$eventID <- 1
dt1 <- unique(dt1)

# create data semester 2
dt2 <- data.frame(matrix(ncol = 2, nrow = 1000))
colnames(dt2) <- c("sequenceID","item")

set.seed(123)
dt2$sequenceID <- sample(x=1:250,1000,replace = T)

set.seed(123)
dt2$item <- sample(x=c("STATS","ECON","PHY","ENG","POLS"),1000,replace=T)

dt2$eventID <- 2
dt2 <- unique(dt2)

# # create data semester 3
dt3 <- data.frame(matrix(ncol = 2, nrow = 1000))
colnames(dt3) <- c("sequenceID","item")

set.seed(123)
dt3$sequenceID <- sample(x=1:250,1000,replace = T)

set.seed(123)
dt3$item <- sample(x=c("PHY","ENG","POLS","BIO","CHEM"),1000,replace=T)

dt3$eventID <- 3
dt3 <- unique(dt3)


dt <- rbind(dt1,dt2,dt3)
dt <- dt %>% arrange(sequenceID, eventID)
head(dt)
A data.frame: 6 × 3
sequenceIDitemeventID
<int><chr><dbl>
1941ECON 1
19411STATS2
19421PHY 3
3562ECON 1
8522MATH 1
35612ENG 2
dt <- dt %>% group_by(sequenceID, eventID) %>% 
  summarise(item = paste0(item, collapse = ","),size = n())

dt <- dt %>% select(sequenceID,eventID,size,item)
head(dt)
`summarise()` has grouped output by 'sequenceID'. You can override using the `.groups` argument.
A grouped_df: 6 × 4
sequenceIDeventIDsizeitem
<int><dbl><int><chr>
111ECON
121STATS
131PHY
212ECON,MATH
222ENG,POLS
232BIO,CHEM
library(arulesSequences)
write.table(dt, "course_term.csv", sep=";", row.names = FALSE, col.names = FALSE, quote = FALSE)
trans_matrix <- read_baskets("course_term.csv", sep = ";", info = c("sequenceID","eventID", "size"))
summary(trans_matrix)
Warning message in type.convert.default(X[[i]], ...):
“'as.is' should be specified by the caller; using TRUE”
transactions as itemMatrix in sparse format with
 738 rows (elements/itemsets/transactions) and
 291 columns (items) and a density of 0.003436426 

most frequent items:
     ECON       ENG  PSY,ECON     STATS STATS,PSY   (Other) 
       15        15        14        13        13       668 

element (itemset/transaction) length distribution:
sizes
  1 
738 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       1       1       1       1       1 

includes extended item information - examples:
        labels
1          BIO
2     BIO,CHEM
3 BIO,CHEM,ENG

includes extended transaction information - examples:
  sequenceID eventID size
1          1       1    1
2          1       2    1
3          1       3    1
# Get frequent sequences and corresponding support values
s1 <- cspade(trans_matrix, parameter = list(support = 0.02), control = list(verbose = F))
s1.df <- as(s1, "data.frame")
summary(s1)
inspect(head(s1))
set of 55 sequences with

most frequent items:
     PHY PHY,POLS     POLS      ENG      BIO  (Other) 
       3        3        3        3        2       54 

most frequent elements:
     {ENG}      {PHY} {PHY,POLS}     {POLS}      {BIO}    (Other) 
         3          3          3          3          2         54 

element (sequence) size distribution:
sizes
 1  2 
42 13 

sequence length distribution:
lengths
 1  2 
42 13 

summary of quality measures:
    support       
 Min.   :0.02033  
 1st Qu.:0.02033  
 Median :0.02439  
 Mean   :0.02905  
 3rd Qu.:0.03252  
 Max.   :0.06098  

includes transaction ID lists: FALSE 

mining info:
         data ntransactions nsequences support
 trans_matrix           738        246    0.02
   items            support 
 1 <{BIO}>       0.03252033 
 2 <{CHEM}>      0.02845528 
 3 <{CHEM,BIO}>  0.02032520 
 4 <{CHEM,PHY}>  0.02032520 
 5 <{ECON}>      0.05284553 
 6 <{ECON,MATH}> 0.03252033 
 
#Convert Back to DS
itemsets_df <- as(s1, "data.frame") %>% as_tibble()
library(tidyverse)
#Top 10 Frequent Item Sets
itemsets_df %>%
  slice_max(support, n = 10) %>% 
  ggplot(aes(x = fct_reorder(sequence, support),
                    y = support,
                    fill = sequence)) + 
    geom_col() + 
    geom_label(aes(label = support %>% scales::percent()), hjust = 0.5) + 
    labs(x = "Site", y = "Support", title = "Most Frequently Visited Item Sets",
         caption = "**Support** is the percent of segments the contain the item set") + 
    scale_fill_discrete(guide = F) +
    scale_y_continuous(labels = scales::percent,
                       expand = expansion(mult = c(0, .1))) + 
    coord_flip() + 
    cowplot::theme_cowplot() 
Warning message:
“It is deprecated to specify `guide = FALSE` to remove a guide. Please use `guide = "none"` instead.”
../_images/Lecture3_37_1.png
rules <- ruleInduction(s1, 
                       confidence = 0.01, 
                       control = list(verbose = FALSE))

inspect(head(rules, 5))
   lhs                  rhs                 support confidence      lift 
 1 <{PHY,POLS}>      => <{POLS,CHEM}>     0.0203252  0.5000000 24.600000 
 2 <{PHY}>           => <{POLS}>          0.0203252  0.4545455  9.318182 
 3 <{STATS,PHY,ENG}> => <{PHY,POLS,BIO}>  0.0203252  1.0000000 49.200000 
 4 <{STATS,PHY}>     => <{PHY,POLS}>      0.0203252  1.0000000 24.600000 
 5 <{STATS,POLS}>    => <{PHY,CHEM}>      0.0203252  1.0000000 49.200000 
 
rules_cleaned <- rules[!is.redundant(rules)]
rules_df <- as(rules_cleaned, "data.frame") %>% 
  as_tibble() %>% 
  separate(col = rule, into = c('lhs', 'rhs'), sep = " => ", remove = F)

rules_df %>% 
  arrange(-confidence) %>% 
  select(lhs, rhs, support, confidence, lift) %>% 
  head() %>% 
  knitr::kable()
|lhs               |rhs              |   support| confidence|  lift|
|:-----------------|:----------------|---------:|----------:|-----:|
|<{STATS,PHY,ENG}> |<{PHY,POLS,BIO}> | 0.0203252|  1.0000000| 49.20|
|<{STATS,PHY}>     |<{PHY,POLS}>     | 0.0203252|  1.0000000| 24.60|
|<{STATS,POLS}>    |<{PHY,CHEM}>     | 0.0203252|  1.0000000| 49.20|
|<{ECON,PHY}>      |<{ENG,POLS}>     | 0.0243902|  1.0000000| 30.75|
|<{POLS,STATS}>    |<{CHEM,PHY}>     | 0.0203252|  1.0000000| 49.20|
|<{POLS,ENG}>      |<{CHEM,BIO}>     | 0.0203252|  0.8333333| 41.00|