Inductive Programming by ExpectationMaximization AlgorithmRoger HsiaoInterACT, Language Technologies InstituteCarnegie Mellon [email protected] paper proposes an algorithm which can write programs automat-ically to solve problems. We model the sequence of instructions as an-gram language model and the sequence is represented by some hid-den variables. Expectation maximization algorithm is applied to trainthe n-gram model and perform program induction. Our approach is veryflexible and can be applied to many problems. In this paper, we will con-centrate on function approximation and traveling salesperson problem.1 IntroductionThis paper proposes a novel machine learning algorithm which can write programs auto-matically to solve problems. Program, in the context of this paper, is known as a sequenceof user defined instructions which operate on some continuous or discrete variables. Theproposed algorithm allows users to plug in any instruction set, so as a result, it providesa very flexible framework which allows this algorithm can be applied to many kinds ofproblems.Formally speaking, given some data, this algorithm induces a program for a machine whichcomes with an user defined instruction set and a memory module. Memory in this contextis defined as a set of continuous or discrete variables. Instruction set are some functionswhich operates on this memory. One can imagine the PCs nowadays are examples of thiskind of machines. The automatic procedure of inducing a program from data is known asprogram induction or inductive programming.Different from the previous approaches discussed in [2][6], this paper proposes a prob-abilistic framework for inductive programming. The induction process is driven by theexpectation maximization algorithm (EM) [3]. In which, each program is modeled by aN-gram language model (LM) and the instruction sequences are hidden variables.The ultimate goal of this project is to explore whether it is possible for a machine to writeprograms automatically. And if it is possible, we would like to know what kind of problemswould be suitable to use this approach. We believe that this algorithm allows ones to get afairly good solution for a very difficult problem immediately without serious study on theproblem. As a preliminary study, we try to explore two applications: function approxima-tion and traveling salesperson problem (TSP). The function approximation problem studiedin this paper is different from a standard approximation problem, since in addition to getgood approximation, we are interested in whether the new algorithm can recover the truefunctional form of the target function. For TSP, as a NP-hard problem, we interpret thetour, i.e. sequence of cities, as a sequence of instructions. Our approach allows us to useEM algorithm to tackle TSP which is difficult to solve in general.This paper is organized as follows: Section 2 will briefly describe some existing workabout program induction. In section 3, we present how to apply EM algorithm to inductiveprogramming. Section 4 is about experiments and finally, we have a conclusion and adiscussion about the future work.2 Related WorkThe term “inductive programming” sometimes may refer to some methodology in softwareengineering or any existing learning algorithms in machine learning [5][7]. However, thispaper is going to adopt a more specific definition: Design an algorithm which can write aprogram automatically given some information or data.One of the existing approaches for inductive programming is using genetic algorithm [2],in which, a program is made up of some instructions and they are treated as an organism.Through an evolution process, one can obtain a program for the designed task.Another approach of inductive programming is using functional programming languages.[6] suggests an algorithm which can induce functional program by generalization of thetraining examples. However, a drawback of this approach, as mentioned in the paper, alltraining examples have to be correct, which is unlikely to be true in real application.3 Program Induction by EM algorithmUnlike previous approaches, this paper describes how to apply EM algorithm for inductiveprogramming. We propose a probabilistic framework for program induction.A machine is defined by a tuple (Φ, M) where Φ is the instruction set which is a set of userdefined functions, {f1, . . . , fN}; M is the memory which is a set of continuous or discretevariables, {m1, . . . , mK}. A program is considered as a sequence of functions. Supposea program has T steps, then it is denoted by F = (f1, . . . , fT). Functions are operatorsworking on the variables in the memory, so given a program and an initial configurationof the memory, one can compute the final state of the memory by using the user definedfunctions as illustrated in figure 1.Figure 1: Program as a sequence of instructions.Program is modeled as a n-gram language model. In order words, the probability ofchoosing a function at step t depends on the functions chosen in the last n steps, i.e.P (ft|ft−1, . . . , ft−n+1). Linear interpolation of lower order n-gram probabilities is ap-plied for smoothing due to data sparseness for higher order n-grams. In this paper, we willcall this n-gram language model as program model.Given a problem with observable data O, we are interested in computing P (O). P (O|F )measures how strong the program F supports the data O. The program F is a latent vari-able, so we do not observe F directly. Therefore, in order to maximize the log likelihoodof the data. We apply the EM algorithm which maximizes the expected complete log like-lihood,Q(θ, θold) = EF[log P (O, F |θ)|x, y, θold]=XFP (F |O, θold) log P (O|F, θ)+XFP (F |O, θold) log P (F |θ) , (1)where θ and θoldare the parameters of the program model. θoldcorresponds to the param-eters of the last EM iteration and θ is the parameters to be estimated. Due to the fact thatthe number of possible programs is huge, equation 1 is not tractable unless we apply someapproximation.Assuming all programs must have T steps, from the initial memory configuration, we canbuild a search tree from step 1 to T as shown in figure 2. In order to keep the tree smallenough, we prune the tree by considering the probabilities from the program model, and wewill keep a fixed amount of active tokens during the search. This fixed amount is namedprune width in this paper. Once we get the search tree, we assume that for
View Full Document