# data mining discussion 3 2

 Answer any one question: Consider the following deﬁnition of an anomaly: An anomaly is an object that is unusually inﬂuential in the creation of a data model. (a) Compare this deﬁnition to that of the standard model-based deﬁnition of an anomaly, then expound on the differences? (b) For what sizes of data sets (small, medium, or large) is this deﬁnition appropriate? Or 3. In one approach to anomaly detection, objects are represented as points in a multidimensional space, and the points are grouped into successive shells, where each shell represents a layer around a grouping of points, such as a convex hull. An object is an anomaly if it lies in one of the outer shells. (a) To which of the deﬁnitions of an anomaly in Section 10.1.2 is this deﬁnition most closely related? (b) Name two problems with this deﬁnition of an anomaly? Or 4. Association analysis can be used to ﬁnd anomalies as follows. Find strong association patterns, which involve some minimum number of objects. Anomalies are those objects that do not belong to any such patterns. To make this more concrete, we note that the hyperclique association pattern discussed in Section 6.8? is particularly suitable for such an approach. Speciﬁcally, given a user-selected h-conﬁdence level, maximal hyperclique patterns of objects are 159 found. All objects that do not appear in a maximal hyperclique pattern of at least size three are classiﬁed as outliers. (a) Does this technique fall into any of the categories discussed in this chapter? If so, which one? (b) Name one potential strength and one potential weakness of this approach… Or 5. Discuss techniques for combining multiple anomaly detection techniques to improve the identiﬁcation of anomalous objects. Consider both supervised and unsupervised cases. In the supervised case? In the unsupervised approach? Or 6. Describe the potential time complexity of anomaly detection approaches based on the following approaches: model-based using clustering, proximity based, and density. No knowledge of speciﬁc techniques is required. Rather, focus on the basic computational requirements of each approach, such as the time required to compute the density of each object. 23 23 0 12

