Description

This dataset describes a set of 102 molecules of which 39 are judged by human experts to be musks and the remaining 63 molecules are judged to be non-musks. The goal is to learn to predict whether new molecules will be musks or non-musks. However, the 166 features that describe these molecules depend upon the exact shape, or conformation, of the molecule. Because bonds can rotate, a single molecule can adopt many different shapes. To generate this data set, all the low-energy conformations of the molecules were generated to produce 6,598 conformations. Then, a feature vector was extracted that describes each conformation. This many-to-one relationship between feature vectors and molecules is called the "multiple instance problem". When learning a classifier for this data, the classifier should classify a molecule as "musk" if ANY of its conformations is classified as a musk. A molecule should be classified as "non-musk" if NONE of its conformations is classified as a musk.

Related Papers

  • Giorgio Valentini. An experimental bias--variance analysis of SVM ensembles based on resampling techniques. [link]
  • Zhi-Hua Zhou and Min-Ling Zhang. Ensembles of Multi-instance Learners. ECML. 2003. [link]
  • Zhi-Hua Zhou and Hua Zhou. Multi-Instance Learning: A Survey. National Laboratory for Novel Software Technology. [link]
  • Hendrik Blockeel and Luc De Raedt. Top-down Induction of Logical Decision Trees. Katholieke Universiteit Leuven Department of Computer Science. [link]
  • Qingping Tao and Stephen Scott and N. V. Vinodchandran and Thomas T. Osugi. SVM-based generalized multiple-instance learning via approximate box counting. ICML. 2004. [link]
  • Stephen D. Bay. Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets. ICML. 1998. [link]
  • Giorgio Valentini. Random Aggregated and Bagged Ensembles of SVMs: An Empirical Bias?Variance Analysis. Multiple Classifier Systems. 2004. [link]
  • Qingping Tao Ph. D. MAKING EFFICIENT LEARNING ALGORITHMS WITH EXPONENTIALLY MANY FEATURES. Qingping Tao A DISSERTATION Faculty of The Graduate College University of Nebraska In Partial Fulfillment of Requirements. 2004. [link]
  • Giorgio Valentini. Ensemble methods based on bias--variance analysis Theses Series DISI-TH-2003. Dipartimento di Informatica e Scienze dell'Informazione . 2003. [link]
  • Zhi-Hua Zhou and Min-Ling Zhang. Neural Networks for Multi-Instance Learning. National Laboratory for Novel Software Technology, Nanjing University. [link]
  • Hendrik Blockeel and Luc De Raedt. Lookahead and Discretization in ILP. ILP. 1997. [link]
  • Zhi-Hua Zhou and Min-Ling Zhang. Solving Multi-Instance Problems with Classifier Ensemble Based on Constructive Clustering. National Laboratory for Novel Software Technology. [link]
  • Giorgio Valentini and Thomas G. Dietterich. Low Bias Bagged Support Vector Machines. ICML. 2003. [link]
  • [link]
  • [link]
  • [link]

Related datasets