Wednesday, 28 June 2017

Basic tools and skills programmers need in the cloud : Part One

cloud programming
The Logic Tier

The logic tier is made up of two distinct parts – the application language and the application framework. The easiest way to understand the two is to use the analogy of our own everyday languages. The programming language is the words we use every day – essentially our vocabulary. The programming framework on the other hand are the rules, methods and techniques of putting those words together, a combination of grammar, commonly used phrases and the like.

There is a huge variety of different languages, but some of the better used ones, as detailed by Barton George, 6 include:

• Java/.NET – The incumbent enterprise development languages. Very powerful but relatively difficult to learn and time-consuming to program in.

• C++ — A statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features.

• Dynamic languages – These are popular for creating web applications since they are both simpler to learn and faster to code in than traditional enterprise standards. This offers a substantial time to market advantage.

• PHP – A server-side scripting language originally designed for web development to produce dynamic web pages. PHP is infamous for being very quick and easy to get started with but turning into a mess of “spaghetti code” after years of work and different programmers.

• Perl – One of the original programming languages of the web, Perl emphasizes a very “Unix way” of programming. Perl can be quick and elegant, but, like PHP, can result in a pile of hard to maintain code in
the long term. Perl was extremely popular in the first Internet bubble, but it has since taken a back-seat to more popular development languages such as PHP, Java, and Rails.

• Python – Like all dynamic languages, Python emphasizes speed of development and code readability. Python emphases broadness of functionality while at the same time being a proper, object oriented programming language.

• JavaScript – Once a minor language used in web browsers, JavaScript has become a stand-alone language known and used by many programmers. Most web applications will include the use of JavaScript.

• Ruby – Ruby and Python are very similar in ethos, emphasizing fast coding with a more human-readable syntax. Ruby became famous with the rise of Rails in the mid-2000s and is still very popular. Ruby can
also be run on top of the Java virtual machine, providing a good bridge to the Java world.

• Scala – A somewhat exotic language, Scala is good for massive scale systems that need to be concurrent. Scala runs on the Java Virtual Machine and Common Language Runtime. Interestingly Twitter moved much of its back-end systems from Ruby to Scala as it sought to handle scaling issues.

• R – A programming language and software environment for statistical computing and graphics.

• Node.js (aka “Node”) – Node takes JavaScript, which was originally designed to be used in web browsers, and uses it as a server-side environment. It is intended for writing scalable network programs such as web servers.

• Clojure – A recent dialect of the Lisp programming language, Clojure is good for data intense applications. It runs on the Java Virtual Machine and Common Language Runtime

The Data Tier 

The data tier is the foundation of a database application; it is the tier that manages the data to be created and consumed by the logic tier. At the data tier, developers are concerned with the storage and retrieval of data as well as managing the access to the data from potentially multiple different areas of the logic tier.

At the data tier, developers will be thinking about the makeup of their data both in terms of quantity (the volume of data items and the velocity at which data must be processed) as well as the quality (is it structured data, i.e., able to be organized into rows and columns, or is it unstructured data where there is no comfortable way of structuring the different data points?).

Depending on the requirements of the particular application, developers will either be looking at using one of a variety of relational databases for structured data or a so-called NoSQL database for unstructured data. Some examples of these two database types are as follows:

Relational Databases

• MySQL – The most popular open source Relational Database Management System (RDBMS)

• Drizzle – A version of MySQL that is specifically targeted at the cloud

• PostgreSQL (Postgres) – An object-relational database management system Oracle DB

• SQL Server

NoSQL Databases

• MongoDB – An open source, high-performance database written in C++

• Riak – A NoSQL database/datastore written in Erlang

• Couchbase – A database powered by Apache CouchDB

• Cassandra – A scalable NoSQL database with no single points of failure; distributed and high performance

• Mahout – A scalable machine learning and data mining library
Alongside the different approaches towards databases, developers will often utilize systems for handling the distribution of processing of large data sets across widespread computing resources. MapReduce, enabled on the Hadoop platform, is a software framework that makes it easy to write applications to process large amounts of data. Hadoop is an open source platform that is well suited to manage the processing of large volumes of unstructured data. It manages MapReduce jobs across a large number of individual servers.

There are a number of different Hadoop utilities for different purposes, as Barton George details in a recent blog post7. Examples of these include:

• HBase – The Hadoop database that supports structured data storage for large tables; provides real time read/write access to big data

• Hive – A data warehousing solution built on top of Hadoop

• Pig – A platform for analyzing large data that leverages parallel computation

• ZooKeeper – Allows Hadoop administrators to track and coordinate distributed applications

• Oozie – A workflow engine for Hadoop

• Flume – A service designed to collect data and put it into a Hadoop environment

• Whirr – A set of libraries for running cloud services

• Sqoop – A tool designed to transfer data between Hadoop and relational databases


• Hue – A browser-based desktop interface for interacting with Hadoop


REFERENCE

Ben Kepes - Rackspace More information on Ben and Diversity Limited can be found at http://diversity.net.nz

No comments:

Post a Comment