DOC PREVIEW
Columbia COMS W4115 - MR Language Reference Manual

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

MR Language Reference ManualSiyang Dai (sd2694)Jinxiong Tan (jt2649)Zhi Zhang (zz2219)Zeyang Yu (zy2156)Shuai Yuan (sy2420) 1MR Language Reference Manual1. Introduction1.1 Concept of MapReduce1.2 Data-flow of MapReduce1.3 The MR Programming Language1.4 Input and Output of MR Program2. Lexical Elements2.1 Tokens2.2 Constants2.3 Keywords2.4 Identifiers2.5 Operators2.6 Separators2.7 Comments3. Data Types3.1 Int3.2 Double3.3 Boolean3.4 List3.5 Conversions4. Program Structure4.1 Configuration4.2 Mapper/Reducer Definition4.3 Scope5. Expression5.1 Operators5.2 Primary Expression5.3 Unary Negative Operator5.4 Binop Operation5.5 Split Operation5.6 Assignment Expression5.7 Declaration Expression6. Statements6.1 Expression Statement6.2 Block statement6.3 Emit Statement6.4 Conditional Statement6.5 Iteration Statement7. Reference 1. Introduction2MapReduce is a programming paradigm to support distributed computing on large data sets on clusters of computer. The paradigm is inspired by the map and reduce functions universally used in functional programming. The MR programming language is designed specifically for MapReduce.1.1 Concept of M1.1.1 List ProcessingEssentially, a MapReduce program convert lists of input data elements into lists of output data elements. The transformation is done by two phases: map and reduce.1.1.2 MapThe first phase of a MapReduce program is called mapping. A list of data pairs are provided, one at a time, to a function called the Mapper, which transforms each input element individually to an output data element. Logically, a map function is defined as the following form:Map(k1,v1) → list(k2,v2) Figure 1 Map1After that, all pairs with the same key from all lists generated by map function will be grouped together, thus creating one group for each one of the different generated keys. The groups will be the input of the next phase.1.1.3 ReduceReducing allows you aggregate values together. A reduce function receives a list of values with the same key. It then combines these values together. Logically, a reduce function is defined as the following form:Reduce(k2, list (v2)) → (k3,v3)Figure 2 Reduce1Figure 1,2,3 are from Hadoop Tutorial on Yahoo Developer Network3As a result, we get a pair of (k,v) for each distinct key generated by map function.1.2 Data-flow of MapReduceCombining map and reduce, we can have the following overview for the data-flow of a MapReduce program on a cluster consisting of three nodes:Figure 3 MapReduce1.3 The MR Programming LanguageMR is designed to support MapReduce paradigm. It hides the details of MapReduce framework from the programmers. All the programmers need to do is to define a map function and a reduce function. The program will be run according to the data-flow of MapReduce.1.4 Input and Output of MR ProgramAn MR program takes two arguments from command line. The first one is the input directory. And the second one is the output directory. 1.4.1 InputAll files under the input directory are used as input files. MR treats each line of each input file as a separate record, and performs no parsing. It feeds the map function with the byte offset of the line as key and the line content as value. Therefore, for map function, k1 is always an integer and v1 is always one line of text.1.4.2 OutputThe output directory must not exist before the MR program runs. The MR program will create one automatically. The output of reduce function will be written to files under the output directory 4in form “key \t value” per line.2. Lexical Elements2.1 TokensThere are five kinds of tokens in MR, i.e., literals, keywords, identifiers, operators and other separators. Blanks, newlines and comments are ignored during lexical analysis except that they separate tokens.2.2 Constants2.2.1 Text ConstantText constant is a string containing a sequence of characters surrounded by a pair of double quotes, i.e. “...”. For example, “hello world!” is a Text constant. Identical Text constants are the same. All Text literal are immutable. One thing to note is that, in MR, there is no character type. Even a single character is Text constant type which can be regarded as an extended character set. 2.2.2 Int ConstantA Int constant refers to a integer consisting of a sequence of digits. It supports signed and unsigned integers. Int constant cannot start with a 0 (digit zero). All integers are default to be decimal (base 10). For example, -15 and 2012 are valid Int constant.2.2.3 Double Constant In MR, a double constant refers to a floating constant which consists a integer part, a decimal point and a fraction part. In addition, it supports an ‘e’ followed by an optionally signed integer exponent. The integer part and fraction part can be one digit or a sequence of digits. Either of them can be missing, but not both. Also either the decimal point or the e and the exponent (not both) may be missing. The following are valid Double constants: 1. or 0.5e15 or .3e+3 or .2 or 1e52.3 KeywordsThe following words are reserved as the keywords which cannot be used otherwise.Text Int Double Boolean Listdef if else foreach emitand or Mapper Reducer splitby true false2.4 IdentifiersIdentifiers are used for naming variables, parameters and functions. Identifier consists of a sequence of letters, digits and the underscore _ , but it must start with a letter. Identifier should not be the keywords listed above. It is case-sensitive. 2.5 OperatorsAn operator is a special token that performs an operation, such as addition or subtraction, on 5either one or two operands. More details will be covered in later section.2.6 SeparatorsA separator separates tokens. Other separators (Blanks, newlines and comments) are ignored during lexical analysis except the following: ( ) < > { } ; 2.7 Comments// is used to indicate the rest of the line is comment (C++/Java style comment)3. Data Types3.1 IntThe 64-bit Int data type can hold integer values in the range of −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.3.2 DoubleThe Double type covers a range from 4.94065645841246544e-324d to 1.79769313486231570e+308d (positive or negative).3.3 BooleanA variable of Boolean may take on the values true and false only.3.4 ListIt is used as List<T>, i.e. List<Int> represents a list of Int values. It has unlimited size.3.5 ConversionsWhen a value of Double type is converted to Int type, the fractional part is discarded. When a value of integral type is


View Full Document

Columbia COMS W4115 - MR Language Reference Manual

Documents in this Course
YOLT

YOLT

13 pages

Lattakia

Lattakia

15 pages

EasyQL

EasyQL

14 pages

Photogram

Photogram

163 pages

Espresso

Espresso

27 pages

NumLang

NumLang

6 pages

EMPATH

EMPATH

14 pages

La Mesa

La Mesa

9 pages

JTemplate

JTemplate

238 pages

MATVEC

MATVEC

4 pages

TONEDEF

TONEDEF

14 pages

SASSi

SASSi

16 pages

JTemplate

JTemplate

39 pages

BATS

BATS

10 pages

Synapse

Synapse

11 pages

c.def

c.def

116 pages

TweaXML

TweaXML

108 pages

Load more
Download MR Language Reference Manual
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MR Language Reference Manual and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MR Language Reference Manual 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?