No Code MapReduce

With AppSymphony Release 2.5 nearly complete – and having not posted a blog entry in over three months! – I plan to spend some time over the next several weeks highlighting AppSymphony’s most talked about features. You’ll soon be able to try AppSymphony for yourself on the Amazon Web Services (AWS) Marketplace (more details soon!), but in the meanwhile I’d like to emphasize some of the coolness you’ll see once we launch there.

I was on a call several weeks ago with a potential (now actual) customer. We were discussing exactly what AppSymphony is, how it can be used to create flexible solutions for end-users, and, of most interest during the call, how it allows a user to graphically compose and execute powerful analytic apps without any coding. In particular, the discussion centered around graphically composing Apache Hadoop MapReduce jobs as part of an AppSymphony app. The question was, essentially, “by no coding required, do you really mean someone does some coding and then that code is subsequently executed by AppSymphony as part of an app?”  I was happy to explain that by no coding required, I really did mean no coding required. Luckily, I had a Hadoop 2.0/YARN-compatible MapReduce Word Count app that I was able to share in order to describe exactly how that works. Since I’m sure others may have a similar question about AppSymphony’s integration with Hadoop MapReduce, here’s a summary.

 

MapReduce Template

AppSymphony comes with 100s for building blocks, which we call Components.  Components can be stitched together to create Applications, which we call Apps.  Provided the output from one Component matches the input of a second Component, those two Components can be stitched together (rest assured, there’s type checking and validation).  One of the Components available in AppSymphony is the “MapReduce Template”.

MapReduce Template

As you can see, the MapReduce Template looks very much like any other AppSymphony Component. However, the MapReduce Template is a Composite Component, meaning it contains other Components. Opening the MapReduce Template, you see Components contained.

MapReduce Template Elements

The reason we call the MapReduce Template Component a “template” is that it contains a pre-configured set of supporting Components.  There’s one Component that describes the input format for the MapReduce job (Text Input, in this example), there’s one Component that describes the output format for the MapReduce job (Overwriting Text Output, in this example), and there are two other Composite Components – Map Composite and Reduce Composite.

The input format and output format Components are pluggable.  For example, you can swap out the “Text Input” Component in favor of a “Key-Value Input” Component.  In addition, though we’ve included a number of Apache Hadoop’s provided input and output format types, you can also create your own input or output format Components using the AppSymphony Component Development Kit (CDK).

 

Map Tasks and Reduce Tasks

Within the Map Composite and Reduce Composite is the graphically composed description of the Map Task and Reduce Task to execute when the MapReduce job is submitted, respectively.  Just like a regular (i.e. coded) MapReduce job, the Map Composite receives a Key and a Value as input, operates on the Key and/or Value, and produces an list of resulting Keys and Values.  However, unlike a regular (i.e. coded) MapReduce job, I can easily modify the Map Task, re-execute the AppSymphony App, and completely change the functionality of my MapReduce job without re-writing, re-compiling, or re-deploying any source code!

MapReduce Map

The same is true for the Reduce Task.

MapReduce Reduce

 

MapReduce Accessibility and Agility

As you hopefully can see, using AppSymphony to build analytic apps that leverage Hadoop MapReduce provides an awesome amount of flexibility.  By dramatically reducing the time between adjusting a Map or Reduce task and observing the result, AppSymphony users are able to produce better results in less time.  We’ve also found that the shortened feedback cycle and no-code approach provides data experts (who may not have software engineering skills) the freedom to explore new algorithmic approaches to emerging problems and new data sets.  In making MapReduce experimentation more flexible, it tuns out we’ve also made it more accessible.

Stay tuned for more on Release 2.5 features and on our AWS Marketplace offering!

Author:

Similar Posts

Leave A Comment

You must be logged in to post a comment.