Association Rules

Association Rules

 





Data mining is the process of discovering patterns and insights from large and complex datasets. One of the most common techniques in data mining is association rule mining, which aims to find relationships among items or attributes in a transactional database. For example, an association rule might reveal that customers who buy bread and milk also tend to buy eggs and cheese.


Association rules are usually expressed in the form of X => Y, where X and Y are sets of items or attributes. The rule means that if a transaction contains X, then it is likely to contain Y as well. To measure the quality and usefulness of association rules, two metrics are commonly used: support and confidence. Support is the proportion of transactions that contain both X and Y, while confidence is the conditional probability of finding Y given X. For example, if 10% of transactions contain bread, milk, eggs and cheese, and 80% of transactions that contain bread and milk also contain eggs and cheese, then the support of the rule {bread, milk} => {eggs, cheese} is 0.1 and the confidence is 0.8.


However, support and confidence are not enough to capture all aspects of association rules. For instance, some rules might have high support and confidence but low interestingness or novelty. For example, the rule {bread} => {milk} might be obvious and trivial for a grocery store. To address this issue, other metrics have been proposed to evaluate association rules, such as lift, conviction, leverage, interest factor and correlation. These metrics take into account the expected frequency of X and Y based on their individual frequencies, and compare it with their actual co-occurrence frequency. For example, lift is the ratio of the support of X => Y to the product of the support of X and Y. A lift greater than 1 indicates that X and Y are positively correlated, while a lift less than 1 indicates that they are negatively correlated.


Association rule mining can be applied to various domains and applications, such as market basket analysis, recommender systems, web usage mining, bioinformatics and text mining. However, there are also some challenges and limitations associated with association rule mining. For example, finding all possible association rules in a large database can be computationally expensive and generate a huge number of rules that are redundant or irrelevant. Therefore, efficient algorithms and pruning strategies are needed to reduce the search space and filter out uninteresting rules. Moreover, association rule mining does not take into account the temporal or sequential aspects of transactions, which might be important for some applications. For example, in web usage mining, it might be more useful to find patterns of user navigation paths rather than static sets of web pages visited. Therefore, extensions and variations of association rule mining have been developed to address these issues, such as sequential pattern mining, temporal association rule mining and generalized association rule mining.


In conclusion, association rule mining is a powerful and widely used technique in data mining that can reveal interesting and useful patterns from transactional databases. However, it also has some drawbacks and limitations that need to be considered when applying it to real-world problems. Therefore, researchers and practitioners should be aware of the advantages and disadvantages of association rule mining, as well as its extensions and variations.


Post a Comment

Previous Post Next Post