DOC PREVIEW
Columbia COMS W4115 - Stella - An Environment for Experimental Machine Learning

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stella:An Environment for Experimental Machine Learning Antonio KantekCVN Student 1. Introduction1. Machine LearningMachine Learning covers a variety of topics. In this work, Machine Learning (or simply ML) stands for data analysis using computational statistics. Data analysis can be understood as the process of extracting a set of patterns (knowledge) from raw data (information). One can understand extracting information from raw data as running a SQL query against a relational database (task related to low level operational support purpose – e.g. select all customer where birthday is 10/31/75). Extracting a set of patterns from a dataset means to create ML models that discover rules and associations hidden in datasets (e.g. If customer is male and age < 15 then buy videogame xyz with confidence = 80% and support = 75%). Implementing ML models is a both iterative and interactive (semi-automatic) process. There are three main types of ML models: i) classifiers (both nominal and numeric attribute predictors), ii) clustering and iii) association rules learner. They all share the same common input: a dataset composed by instances. Each instance (or row) is composed by an array of numbers, strings or dates. Classifier (also know as Classification Learning) is a ML model that predicts what will happen in new (unseen before) data. As an example, consider a medical diagnosis, where a classifier will predict whether or not a patient has a given disease. The outcome (class to predict) can be a nominal one (like buy or not buy) or a numeric quantity. Bayesian Networks [1] and C4.5 [2] are both famous types of classifiers. Clustering is the second type of ML. It is similar to Classification Learning, but the attribute to predict will be defined by the model itself (rather than the user). By doing that, the model will be able to group similar classes (the class represents the relationship between predictor attributes and the goal attribute values). Some algorithms for clustering (like K-Means [3]) use the concept of geometric distance between instances in order to group the closest ones. The last type of ML model is Association Rule, which is related to structural data description instead of class prediction. Rules are commonly represented as if then rules. A Decision Tree is a data structure composed by the join of several if then rules. In a Decision Tree, each internal node contains a rule for a predictor attribute (e.g. Attribute: customer age < 15 yes/no) and each leaf node represents the instance classification (e.g. Decision: customer buy yes/no). Apriori [4] is a popular algorithm for association rule extraction. 2. MotivationThe process of discovering patterns in data is a semiautomatic (empirical) process. There is no universally best algorithm across all datasets (datasets are different according to their attribute types, some have more numerical attributes while others have more nominal attributes). Some algorithms have better performance in one type of dataset and a disastrous one in another type. They are biased accordingto the type of data to process. Stella is a dynamic language for ML model implementation. A dynamic language is the best tool for experimental computing. They provide you with a simple way to load and unload data structures (e.g. Easily dynamic class loading). Models are easily implemented and tested. You should be able to run a piece of code as easily as running a SQL script.The two main common approaches for data analysis using ML are: specialized query languages and frameworks written in general purposes languages like C/C++ and Java. Commercial databases products like Microsoft SQL Server provide some sort of data mining query language [5]. This is a very limited solution, since the user can not build his own models. OR-Objects [6], Oracle Java Data Mining (OJDM) [7] and Weka (Waikato Environment for Knowledge Analysis) [8] are examples of the second approach (frameworks for ML). Weka is a superb, well documented generic framework for ML and it is written in Java. Java is not static as C++ but still is not dynamic enough. It is not possible to load and unload classes in Java without dealing with ClassLoaders issues.Stella is a Domain Specific Language for ML model implementation (and testing). It is an Object Oriented Language (but not Object Oriented Obsessed like Smalltalk). Stella's API is composed by two parts: small generic API (e.g. Generic types like Integer, Double, String, Date and Object) and a extensive ML API (e.g. Classes like DataSet, Classifier, ClassifierEvaluation, Instance, and so on). Besides that, the language offers declarative constructions for some tedious tasks, and of course, a good array manipulation support. 2. Language Reference1. Constants, Enumerations and FunctionsConstants contain immutable values. Functions in Stella is defined in the same way as in (non-OOP) procedural languages like Pascal or C. Constants and functions are the easiest way to declare and implement mathematical functions. Some common functions will be natively implemented in Java. I/O is also done by functions. An enumeration, like in C, defines a sequence of elements.Examples of constants enumerations and functions: enum AttributeType { NOMINAL, NUMERIC, DATE };constant double SMALL := 1.e-6constant double NORMAL_DISTRIBUTION := sqrt(2 * PI);function void out(Object obj); //Console output functionfunction Object fout(Object obj, String file); //File output functionfunction double min(double[] doubles) { check notNull(doubles, "doubles is null"); check notEmpty(doubles, "doubles is empty"); double min; //Default value for numbers is NaN foreach(doubles[i]) { if (min == NaN || min > doubles[i]) { min := doubles[i]; } } ^ min;}2. Classes and ObjectsThe main API is composed by a few classes and some special constructions in order to deal with numbers and arrays. All classes inherent from class Object (you do not need to specify that). Object is a virtual class (the only one) and no one can directly create an instance of Object. The main API does not have a direct support for meta class, introspection and reflection. All methods starting by # are class methods (static method in Java). Instances of string, date and number are immutable objects. Overview of the main classes (some methods are missing):class Object {Object deepCopy();Object shalowCopy();boolean equals(Object obj);String


View Full Document

Columbia COMS W4115 - Stella - An Environment for Experimental Machine Learning

Documents in this Course
YOLT

YOLT

13 pages

Lattakia

Lattakia

15 pages

EasyQL

EasyQL

14 pages

Photogram

Photogram

163 pages

Espresso

Espresso

27 pages

NumLang

NumLang

6 pages

EMPATH

EMPATH

14 pages

La Mesa

La Mesa

9 pages

JTemplate

JTemplate

238 pages

MATVEC

MATVEC

4 pages

TONEDEF

TONEDEF

14 pages

SASSi

SASSi

16 pages

JTemplate

JTemplate

39 pages

BATS

BATS

10 pages

Synapse

Synapse

11 pages

c.def

c.def

116 pages

TweaXML

TweaXML

108 pages

Load more
Download Stella - An Environment for Experimental Machine Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Stella - An Environment for Experimental Machine Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Stella - An Environment for Experimental Machine Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?