Inductive Programming by Expectation Maximization Algorithm Roger Hsiao InterACT Language Technologies Institute Carnegie Mellon University wrhsiao cs cmu edu Abstract This paper proposes an algorithm which can write programs automatically to solve problems We model the sequence of instructions as a n gram language model and the sequence is represented by some hidden variables Expectation maximization algorithm is applied to train the n gram model and perform program induction Our approach is very flexible and can be applied to many problems In this paper we will concentrate on function approximation and traveling salesperson problem 1 Introduction This paper proposes a novel machine learning algorithm which can write programs automatically to solve problems Program in the context of this paper is known as a sequence of user defined instructions which operate on some continuous or discrete variables The proposed algorithm allows users to plug in any instruction set so as a result it provides a very flexible framework which allows this algorithm can be applied to many kinds of problems Formally speaking given some data this algorithm induces a program for a machine which comes with an user defined instruction set and a memory module Memory in this context is defined as a set of continuous or discrete variables Instruction set are some functions which operates on this memory One can imagine the PCs nowadays are examples of this kind of machines The automatic procedure of inducing a program from data is known as program induction or inductive programming Different from the previous approaches discussed in 2 6 this paper proposes a probabilistic framework for inductive programming The induction process is driven by the expectation maximization algorithm EM 3 In which each program is modeled by a N gram language model LM and the instruction sequences are hidden variables The ultimate goal of this project is to explore whether it is possible for a machine to write programs automatically And if it is possible we would like to know what kind of problems would be suitable to use this approach We believe that this algorithm allows ones to get a fairly good solution for a very difficult problem immediately without serious study on the problem As a preliminary study we try to explore two applications function approximation and traveling salesperson problem TSP The function approximation problem studied in this paper is different from a standard approximation problem since in addition to get good approximation we are interested in whether the new algorithm can recover the true functional form of the target function For TSP as a NP hard problem we interpret the tour i e sequence of cities as a sequence of instructions Our approach allows us to use EM algorithm to tackle TSP which is difficult to solve in general This paper is organized as follows Section 2 will briefly describe some existing work about program induction In section 3 we present how to apply EM algorithm to inductive programming Section 4 is about experiments and finally we have a conclusion and a discussion about the future work 2 Related Work The term inductive programming sometimes may refer to some methodology in software engineering or any existing learning algorithms in machine learning 5 7 However this paper is going to adopt a more specific definition Design an algorithm which can write a program automatically given some information or data One of the existing approaches for inductive programming is using genetic algorithm 2 in which a program is made up of some instructions and they are treated as an organism Through an evolution process one can obtain a program for the designed task Another approach of inductive programming is using functional programming languages 6 suggests an algorithm which can induce functional program by generalization of the training examples However a drawback of this approach as mentioned in the paper all training examples have to be correct which is unlikely to be true in real application 3 Program Induction by EM algorithm Unlike previous approaches this paper describes how to apply EM algorithm for inductive programming We propose a probabilistic framework for program induction A machine is defined by a tuple M where is the instruction set which is a set of user defined functions f1 fN M is the memory which is a set of continuous or discrete variables m1 mK A program is considered as a sequence of functions Suppose a program has T steps then it is denoted by F f1 fT Functions are operators working on the variables in the memory so given a program and an initial configuration of the memory one can compute the final state of the memory by using the user defined functions as illustrated in figure 1 Figure 1 Program as a sequence of instructions Program is modeled as a n gram language model In order words the probability of choosing a function at step t depends on the functions chosen in the last n steps i e P ft ft 1 ft n 1 Linear interpolation of lower order n gram probabilities is applied for smoothing due to data sparseness for higher order n grams In this paper we will call this n gram language model as program model Given a problem with observable data O we are interested in computing P O P O F measures how strong the program F supports the data O The program F is a latent variable so we do not observe F directly Therefore in order to maximize the log likelihood of the data We apply the EM algorithm which maximizes the expected complete log likelihood Q old EF log P O F x y old X P F O old log P O F F X P F O old log P F 1 F where and old are the parameters of the program model old corresponds to the parameters of the last EM iteration and is the parameters to be estimated Due to the fact that the number of possible programs is huge equation 1 is not tractable unless we apply some approximation Assuming all programs must have T steps from the initial memory configuration we can build a search tree from step 1 to T as shown in figure 2 In order to keep the tree small enough we prune the tree by considering the probabilities from the program model and we will keep a fixed amount of active tokens during the search This fixed amount is named prune width in this paper Once we get the search tree we assume that for all program F which is not in the tree P F O 0 With this approximation equation 1 remains tractable Figure 2 Building a search tree 3 1 E step During the E step one has to compute
View Full Document