Saturday, December 18, 2010

SOA Governance

As you can see from the title of this post "Governance", i'm going to provide some inside about SOA Governance, SOA architectural world has defined governance in two category.

Design time governance
Runtime governance

Now lets talk about Design time governance-

Design time service governance, as the name implies, typically provides an integrated registry/repository that attempts to manage a service from its design to its deployment, but typically not during runtime execution of the services, albeit some do. Key components of design time service governance include

-A registry and/or repository for tracking service design, management, policy, security, and testing artifacts

-Design tools,including service modeling,dependency tracking,policy creation and management, and other tools that assist in the design of services

-Deployment tools, including service deployment, typically through binding with external development environments

-Links to testing tool sand services,providing the developer/designerthe ability to create a test plan and testing scenarios and then to leverage service-testing technology

In essence, design time service governance works up from the data to the services, gathering key information as it goes. You typically begin by defining the underlying data schema and turning that into metadata and perhaps an abstraction of the data. Then, working up from there, you further define the services that interact with the data, data services, and then transactional ser- vices on top of that. You can further define that into processes or orchestra- tion. All this occurs with design time information managed within the design time service governance system.

Runtime service governance - works and plays in the world of service man- agement and should be linked with design time service governance, but often is not. Design time is all about defining the policies around the use of services. Therefore, runtime governance is the process of enforcing and implementing those policies at service runtime, but it may do other things as well.

Runtime service governance, like design time service governance, comes in many flavors because of the number of vendors in that space and how it is defined by that vendor. There are no de facto standards as to what runtime ser- vice governance needs to be, but certain patterns are emerging.

Runtime service governance typically includes

Service discovery

Service delivery


Setting and maintaining appropriate service levels

Managing errors and exceptions

Enabling online upgrades and versioning

Service validation

Auditing and logging

I hope this will provide some details about SOA governance.

Tuesday, November 30, 2010

Java EE 6

Lets talk about some of the major new features from JEE6, like profile, pruning, extensibility and ease of development. Web profile is targeted bundle of technology or other words stack of technology, profile can be sub set, super set or overlapping. here's stack from first web profile under JEE 6.

Servlet 3.0
JSP Tag Library 1.2
JSF 2.0
EJB 3.1
Common Annotation for Java 1.1
Java Transaction API 1.1
JPA 2.0

While JEE6 provide you some new features, they are taking out some out dated feature sets, like JAX-RPC, EJB - Entity Bean and JAX-R.

Under Extensibility, JEE6 embrace Open Source Library and frameworks. Also, servlet, servlet filter and context get discovered and register for you. Provide zero configuration and scripting language can be used in some cases.
And of course, java CDI is answer for Spring DI and provide similar functionality to developers.

Overall JEE6 looks interesting and you might want to take a look at full stack from Sun, oops from Oracle :-)

Friday, November 19, 2010

What's new in Spring 3.1 - Part II

As you can see from my previous post, that I love talking about Spring Framework and recently in Spring 2GX, they have announced how Spring 3.1 will lay out. It surely looking promising as Spring team puts lots of thought behind each Spring framework milestone. Here are most of the features that you are going to see in Spring 3.1 world.

Scheduler Enhancement - you can schedule cron job to call from java bean, similar to timer bean..
Improve and support JEE6 - Continue to provide more feature enhancement for JEE6
Environment profiles for bean- I think this will be widlt used as this allow to have a profile for different environment, like development, test and production. In other words grouping bean definition for activation in specific environments.
Java based application configuration- I think this will help developers who does not want to deal with XML configuration and want to configure using annotation in java class.

Public class applicationConfiguration
private DataSource ds;
public OrderDetail OrderDetail(){
return new OrderDetailImpl(ds);



Cache Abstraction- provide support for distributed caching that is specially useful for caching solution in cloud . Also will have common adopter for cache solution provider like EHCache. they have support for convenient caching annotation "@Cacheable", they have cache manager API for EHCacheCacheManager, GemFireCacheManager

Conversation Management - provide abstraction for conversational sessions, you should be able to identify conversational state by may be passing around ID for conversational states, I think you might be able to externalize session information and use as distributed caching. Also this theme will provide foundation for Web Flow 3.0

Servlet 3.0, JSF 2.0, Groovy - Automatic deployment of framework listeners. which I think might have drawback as you don't want to deploy everything (as you might be using all ths listeners), they will have support for JSF2.0 and will have more support. They might have template classes for Groovy support.

Friday, October 29, 2010

Keys concept for Building Internal Cloud

Here are some important Key Ingredients for Building an Internal Cloud:

1.Shared Infrastructure. IT needs to understand how to configure the underlying storage and networking so that when it is brought together it can be shared across all of the enterprise’s different workloads. They also need to determine where in that shared infrastructure they should delineate between different users on that infrastructure. And also they need to fully understand their current infrastructure in order to re-use some of tis capabilities. Some keys area to consider like DB or Queue provision on their infrastructure.

2.Self-Service Automated Portal. It is essential to make sure that the compute cloud can be consumed in an easy form by both developers and IT professionals. There is a need for self- service capabilities, and for highly automated provisioning portals that provide the ability to add workloads without having to go through all of the many different steps of provisioning with the network and underlying storage.

3.Scalable. An effective cloud solution has to be scalable. IT organizations should think about boundary conditions in a more creative way, instead using the traditional models of scalability. As a new workload request comes up, they must determine where to provision that specific workload.

4.Rich Application Container. Clouds need to have a richer application container that will show the different interdependencies between components of the application, specifically those that take place between different virtual machines. This information help create the correct network subnets so that the storage will work well together and not be isolated from one another.

5.Programmatic Control. It is very common for a compute cloud to have programmatic control. Some of the more popular compute clouds on the market today have made good use of an API called REST. REST is a very simple HTTP- based protocol that provides the ability to manipulate stateful objects in a clear way. It's better to have organization which have Service Oriented Architecture (SOA), which provided you interoperability and easy to use services, and provide re-use and loose coupling between different applications within an enterprise.

6.100% Virtual Hardware Abstraction. Clouds need 100% hardware abstraction. This can include servers or other physical devices like storage. In a cloud environment, the user should be able to interact with the virtual machines and other devices through the user interface, verses actually changing physical infrastructure.

7.Strong Multi-Tenancy. Strong multi-tenancy involves extensive use of VLANs to isolate network traffic between different zones in the cloud. This is obviously critical in an external cloud, but also a common requirement in internal clouds, to make sure that authorized users have access to certain applications.

8.Chargeback. This may not be 100 % true when you are building internal cloud within organization, but its to have this practice as IT organizations must be able to create effective and accurate chargeback capabilities. For internal clouds, even if funds aren’t literally exchanged,, the ability to create transparency in costs and services can help justify expenses.

Saturday, October 16, 2010

What's new in Spring 3.1?

With all of the good stuff in Spring 3.0, it's hard to imagine what could possibly follow in Spring 3.1.

I just got this news from Spring2GX event side that they are introducing a number of often-requested configuration features. Need a standalone datasource in dev, but one from JNDI in production? Environment-Specific Bean Definitions are a first-class approach to solving this very common kind of problem. Love code-based configuration, but need the power and concision of Spring XML namespaces?

I'm guessing they will have support or integration for other languages like Scala, Groovy etc....well this event is one week away and will update more as i get more info about this...

Just to see what we have so far from spring, here are some points from Spring 3.0 enhancements

Full-scale REST support in Spring MVC, including Spring MVC controllers that respond to REST-style URLs with XML, JSON, RSS, or any other appropriate response. We'll look into Spring 3's new REST support in chapter 12.

A new expression language that brings Spring dependency injection to a new level by enabling injection of values from a variety of sources, including other beans and system properties. We'll dig into Spring's expression language in the next chapter.

New annotations for Spring MVC, including @CookieValue and @RequestHeader, to pull values from cookies and request headers, respectively. We'll see how to use these annotations as we look at Spring MVC in chapter 7.

A new XML namespace for easing configuration of Spring MVC.

Support for declarative validation with JSR-303 (Bean Validation) annotations.

Support for the new JSR-330 dependency injection specification.

Annotation-oriented declaration of asynchronous and scheduled methods.

A new annotation-based configuration model that allows for nearly XML-free Spring configuration. We'll see this new configuration style in the next chapter.

The Object-to-XML (OXM) mapping functionality from the Spring Web Services project has been moved into the core Spring Framework.

Thursday, September 23, 2010

Web caching in the Cloud

This is the buzz word now in enterprise if you are trying to implement private cloud. I think this can be confusing to an extend, is it application caching side we are talking about or are we talking about caching in cloud?

For Applications there are solution like Eh-cahce or open terracotta or Extreme scale from IBM are some good solution.

Web application performance can be improved by caching frequently‐accessed data in high‐speed memory, instead of in databases, they can dramatically reduce response times, economically scale their sites, and substantially reduce the load placed on their databases, file servers, and other information sources.

Now lets talk about web caching in the cloud and some player in this space. Memchached and Gear6 (which provides enhanced solution)

The open‐source distributed cache system Memcached has quickly become the standard for web caching, particularly for operators of dynamic, high‐growth web sites. Memcached stores frequently‐used data in DRAM instead of in databases, providing response time improvements of 100x or more.

It is not surprising that many large organizations are interested in both cloud computing and Memcached web caching, since they offer complementary advantages in cost savings, flexibility and ease of management. But they too have evolved in different communities, largely in isolation from one another, and they do not operate together as seamlessly as might be desired.

As an example, the ability to rapidly scale applications up or down in size is a key advantage of cloud computing, but standard Memcached does not accommodate it very well. Changing the size of a Memcached cache tier causes the cache data to be flushed and then rebuilt, this typically causes degradation in site performance, and causes spikes in the demands on source databases and file servers. Rebuilding the contents of a large cache tier can take hours, and site performance may not be restored to normal until the end of the rebuilding process.

You can read more about Memcahed and Gear6 and MemCahed can be dowloaded from

Download MemChached

Monday, September 20, 2010


Almost a year back I started looking in to multiple options as how to provide better search in quick time against large sets of data in enterprise. for example you might want to search some keyword in exceptions, it could be like sysOut or error logs for enterprise applications.

I think MapReducer is an answer to these problem, here's some info about MapReducer

MapReduce is an parallel and distributed solution approach developed by Google for processing large datasets. MapReduce is utilized by Google and Yahoo to power their websearch. MapReduce was first describes in a research paper from Google .

MapReduce has two key components. Map and Reduce. A map is a function which is used on a set of input values and calculates a set of key/value pairs. Reduce is a function which takes these results and applies another function to the result of the map function. Or with other words: Map transforms a set of data into key value pairs and Reduce aggregates this data into a scalar. A reducer receives all the data for a individual "key" from all the mappers.

The approach assumes that their are no dependencies between the input data. This make it easy to parallelize the problem. The number of parallel reduce task is limited by the number of distinct "key" values which are emitted by the map function.

MapReduce incorporates usually also a framework. A master controls the whole MapReduce process. The MapReduce framework is responsible for load balancing, re-issuing task if a worker as failed or is to slow, etc. The master divides the input data into separate units, send individual chunks of data to the mapper machines and collects the information once a mapper is finished. If the mapper are finished then the reducer machines will be assigned work. All key/value pairs with the same key will be send to the same reducer.

Here's nice video about MapReducer :

Apache Hadoop which start with one large open source project is getting very popular around this space.

Here's link if you want to know more about Hadoop:

Apache Hadoop

If you want paid solution and dont have time to develop, take a look at SPLUNK, its nice and easy and very powerful tool, you can feed any data against this. I have played with splunk and I believe Splunk usages Hadoop APIs.

Thursday, September 16, 2010

Enterprise Architect vs Solution Architect

I find this subject very interesting and at least i didn't find any concrete definitions around these two different roles in IT, what are the main differences (in term of responsibility) between two practices/roles. I think TOGAF and Zackman architectural framework provide in detailed explanation about roles and responsibility for each role in and enterprise. I would like to share what i have got so far....

So what exactly is an Enterprise Architect vs a Solution Architect?

A Solution Architect may have a number of different types of architects working for him/her to help accomplish their task for delivering a high quality solution. For example, s/he might have an Infrastructure Architect to manage the architecture of the solution's hardware configuration so that the solution meets the Quality of Service business and IT requirements while at the same time represents the optimum way to deploy the solution into the target production environments.

There also might be a need for 'specialist' architects depending on the complexities of the business requirements or the target production environment. Such 'specialist' architects usually are called Domain Architects and examples include; Security Architect, Technology Architect (ie architects that specialize in a specific product or technology), Vertical Architects (ie architects that specialize in systems that are specialized in particular industry verticals such as Financial Services Industry, Telecommunicaitons, Public Sector, etc).

An Enterprise Architect may have also have a number of different types of architects in order to piece together a coherent, actionable future state architecture which can easily map to business strategy, be consumed by project teams and be a major contribution in governance activities.

For example, an Enterprise Architect may have Business Architects which document Business Strategies, Business Capabilities, Business Processes and Roles as well as a number of other artifacts.

Lastly, an Enterprise Architect may also have Solution Architects (aka Enterprise Solution Architects and Enterprise Application Architects) who focus on pulling all of the other Enterprise Architecture information together to shape the future state application system architecture.

Here is one place where I think there is confusion by many. You see, I described a Solution Architect at the project-level and an Enterprise Solution Architect across the enterprise and both share the words 'Solution Architect' in their role. They have VERY different purposes but are very complimentary. Could it be that simple of a reason why confustion exists? It just might be.

An Enterprise Solution Architect and project Solution Architect roles are very complimentary and because this relationship is of particular interest to me I'd like to take a moment and propose how they are to work together.

I think that the Enterprise Solution Architect be involved at the earliest point a new business initiative is created and do very high-level 'solutioning' via whiteboard and reference to future state architecture elements. Then, when the business initiative gains momentum is backed by Enterprise Architecture future state analysis the core project team is formed and the project Solution Architect takes over to be responsible for the solution. The Enterprise Solution Architect becomes a member of the project Solution Architects team and is responsible for the system integration architecture where the solution requires integration between enterprise systems as well as be responsible to provide future state system architecture guidance to help the project Solution Architect make decisions on which systems to commit to using in the solution. The Enterprise Solution Architect then is involved in governance boards to inform decision-makers with future-state architecture artifacts to help justify major technology, system, or business value decisions. Thats it.

As a quick summary, project Solution Architects and Enterprise Architects are different in that they have very different purposes. They are highly complimentary in that Solution Architects focus on delivery of solutions and Enterprise Architects focus on supporting them by documenting future state, participating on their teams and being involved in governance activities.

Friday, August 27, 2010

Groovy: Superset of Java

Groovy is a bit like scripting version of Java. Groovy is a superset of Java. It is no way means to replace java, Long time Java
programmers feel at home when building agile applications backed by big architectures. One nice thing about Groovy is that it
reduces the amount of code needed to do common tasks such as parsing XML files and accessing databases. Groovy on Grails is
as the name suggests similar to the Rails framework. For people who don't want to bother with Java's verbosity all the time.

The big advantage is speed, brevity and readability. For instance here is some java code which implements a bag:

public class Bag {

private Map counter = new HashMap();
public void add(Object obj) {
add(obj, 1);


public void add(Object obj, Integer count) {
Integer value = counter.get(obj);
if (value == null)
counter.put(obj, count);
counter.put(obj, value+count);

public Integer getCount(Object obj) {
return counter.containsKey(obj) ? counter.get(obj) : 0;


And here is the same implementation in Groovy:

class Bag {
def counter = [:]
def add(term, count=1) {
if (counter[term] == null)
counter[term] = count

def getCount(term) {
return counter[term] != null ? counter[term] : 0;


So you can see how intuitive the Groovy code is, you don't need to define anything regarding types or default parameter values.

That said, Groovy is essentially Java, just wrapped around a different compiler, class loader and with many added methods and
members. For that reason it is unlikely that it will ever replace Java, not only that but because it IS Java it is bound by the same
limitations and rules and doesn't provide any functionality which is really unique.

Semicolons are optional. Use them if you like (though you must use them to put several statements on one line).
The return keyword is optional.
You can use the this keyword inside static methods (which refers to this class).
Methods and classes are public by default.
Protected in Groovy has the same meaning as protected in Java, i.e. you can have friends in the same package and derived classes can also see protected members.
Inner classes are not supported at the moment. In most cases you can use closures instead.
The throws clause in a method signature is not checked by the Groovy compiler, because there is no difference between checked and unchecked exceptions.
You will not get compile errors like you would in Java for using undefined members or passing arguments of the
wrong type. See Runtime vs Compile time, Static vs Dynamic.
Java can always be made faster, more memory efficient and more robust than Groovy if you are willing to put in the time and

So where would you use Groovy?

I think typically in places where readability, flexibility and speed of development is more important than performance,
extensibility and a large user base. For instance in deployment you can use the Groovy equivalent of Ant called Gant which
allows better flexibility and better readability. Other places might be in a web application framework such as Grails.

You can use Groovy all the time for utilities, both on the command line and on the web. Often, the utilities use jars/class files
from my project, since it is all on the JVM.

For web utils, take a look at Groovlets. You can come up to speed with Groovlets in a couple of hours. A groovlet is simply a
servlet distilled down to its essence.

Also, it does have integration for other open source framework like Spring framework. Groovy is great for writing unit tests for existing Java code and could save lots of development time...

I hope this brief introduction will help in understanding as what's groovy and how it can help in your project.

Friday, August 20, 2010

Scala Vs Java

Form last few days i'm trying to look in to some other dynamic language and trying to compare against java, Scala seems to be the better choice in compare to ruby and others...lets talk about Scala in this post and a small code comparison between Scala and java .. to start with, Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It smoothly integrates features of object-oriented and functional languages.

Scala is object-oriented: Scala is a pure object-oriented language in the sense that every value is an object. Types and behavior of objects are described by classes and traits. Class abstractions are extended by subclassing and a flexible mixin-based composition mechanism as a clean replacement for multiple inheritance.

Scala is functional: Scala is also a functional language in the sense that every function is a value. Scala provides a lightweight syntax for defining anonymous functions, it supports higher-order functions, it allows functions to be nested, and supports currying. Scala's case classes and its built-in support for pattern matching model algebraic types used in many functional programming languages.

Scala is statically typed: Scala is equipped with an expressive type system that enforces statically that abstractions are used in a safe and coherent manner.

Scala is extensible: The design of Scala acknowledges the fact that in practice, the development of domain-specific applications often requires domain-specific language extensions. Scala provides a unique combination of language mechanisms that make it easy to smoothly add new language constructs in form of libraries:

any method may be used as an infix or postfix operator, and closures are constructed automatically depending on the expected type (target typing).

A joint use of both features facilitates the definition of new statements without extending the syntax and without using macro-like meta-programming facilities.

Scala interoperates with Java and .NET: Scala is designed to interoperate well with popular programming environments like the Java 2 Runtime Environment (JRE) and the .NET Framework (CLR). In particular, the interaction with mainstream object-oriented languages like Java and C# is as smooth as possible. Scala has the same compilation model (separate compilation, dynamic class loading) like Java and C# and allows access to thousands of high-quality libraries.

Advantages of Scala:

Scala language has been designed to be scalable. It scales according to user needs. The object oriented concepts allows this language to be used for larger and more complex projects while the functional constructs allows programmer to develop short and concise code.

Portability: Scala runs on Java Virtual machine (JVM) this ensures the portability to all the underlying platforms. Also there is an initiative for building Scala on top of .NET Common language Runtime (CLR).

Performance: Scala generate more efficient java code which leads to better runtime performance.

Every value in Scala is an object and every operation is a method call. It is closer to object oriented programming than functional programming and therefore does not support concepts like static fields and methods. Scala allows you to define own objects and constructs which makes it possible to scale up according to user needs.

Scala uses strict typing. But it also allows most of the typing unspecified. When information is unspecified compiler will do smart type interface to infer information from code.

It incorporates ideas from many languages such as Mixins (a way to address multiple inheritances in LISP) and actors (used message passing communication architecture).

Java echo system is available for Scala.
Scala can be embedded into XML since its syntax is somewhat similar to that of XML
New design paradigms become available (working with closures, traits).
Learning speed for Scala is faster than that of C++ or Java for same task.
IDEA (IntelliJ), NetBeans and Eclipse all has good support for Scala.

Disadvantages of Scala:

Inferior quality of the documentation
Native libraries (Scala uses Java or .Net libraries as base for their own)
IDE and debugging support is not as good as Java.
Non-existing localization of documentation
It requires larger syntax to learn.
It consistently breaks backwards compatibility.

Skilled programmers are required to use the features of the language.

It is still in early adopter phase.
Lack of extensive tutorials

Now lets talk about small code and see, how they differ from each other..

A Better OOP Language

Scala works seamlessly with Java. You can invoke Java APIs, extend Java classes and implement Java interfaces. You can even invoke Scala code from Java, once you understand how certain “Scala-isms” are translated to Java constructs (javap is your friend). Scala syntax is more succinct and removes a lot of tedious boilerplate from Java code.

For example, the following Person class in Java:

class Person {

private String firstName;

private String lastName;

private int age;

public Person(String firstName, String lastName, int age) {

this.firstName = firstName;

this.lastName = lastName;

this.age = age;


public void setFirstName(String firstName) { this.firstName = firstName; }

public void String getFirstName() { return this.firstName; }

public void setLastName(String lastName) { this.lastName = lastName; }

public void String getLastName() { return this.lastName; }

public void setAge(int age) { this.age = age; }

public void int getAge() { return this.age; }


can be written in Scala thusly:

class Person(var firstName: String, var lastName: String, var age: Int)

Yes, that’s it. The constructor is the argument list to the class, where each parameter is declared as a variable (var keyword). It automatically generates the equivalent of getter and setter methods, meaning they look like Ruby-style attribute accessors; the getter is foo instead of getFoo and the setter is foo = instead of setFoo. Actually, the setter function is really foo_=, but Scala lets you use the foo = sugar.

Lots of other well designed conventions allow the language to define almost everything as a method, yet support forms of syntactic sugar like the illusion of operator overloading, Ruby-like DSL’s, etc.

You also get fewer semicolons, no requirements tying package and class definitions to the file system structure, type inference, multi-valued returns (tuples), and a better type and generics model.

One of the biggest deficiencies of Java is the lack of a complete mixin model. Mixins are small, focused (think Single Responsibility Principle ...) bits of state and behavior that can be added to classes (or objects) to extend them as needed. In a language like C++, you can use multiple inheritance for mixins. Because Java only supports single inheritance and interfaces, which can’t have any state and behavior, implementing a mixin-based design has always required various hacks. Aspect-Oriented Programming is also one partial solution to this problem.

The most exciting OOP enhancement Scala brings is its support for Traits, a concept first described here and more recently discussed here. Traits support Mixins (and other design techniques) through composition rather than inheritance. You could think of traits as interfaces with implementations. They work a lot like Ruby modules.

Now look at one design patter example and comparing these two language.

Here is an example of the Observer Pattern written as traits, where they are used to monitor changes to a bank account balance. First, here are reusable Subject and Observer traits.

This trait looks exactly like a Java interface. In fact, that’s how traits are represented in Java byte code. If the trait has state and behavior, like Subject, the byte code representation involves additional elements.

trait Observer[S] {

def receiveUpdate(subject: S);


trait Subject[S] {

this: S =>

private var observers: List[Observer[S]] = Nil

def addObserver(observer: Observer[S]) = observers = observer :: observers

def notifyObservers() = observers.foreach(_.receiveUpdate(this))


In Scala, generics are declared with square brackets, [...], rather than angled brackets, <...>. Method definitions begin with the def keyword. The Observer trait defines one abstract method, which is called by the Subject to notify the observer of changes. The Subject is passed to the Observer.

There are lots of other differences between these languages, and I hope with small post will help others. who are trying to evaluate some other language, or trying to see whats out there....

Thursday, August 12, 2010

Spring RestTemplate

Its hard to say anything about spring without seeming biased in favor. Spring's RestTemplate is as solid as can be expected. Their standard call back based approach for safe release of resources works well even for the REST template. If one is using Spring-mvc and their REST support, very few reasons would drive me to consider an alternative framework for the client. One among them is definitely finer grained control, another is Http Client 4.X support. Documentation is sparse on RestTemplate as well. But one has a community to back up on. There might a bit of up front customizations, standard call back etc that an adopter might create but once done I feel that it would be a easy to work with the RestTemplate.
Clearly one has many choices in selecting a client side framework for RESTful HTTP. In most cases it probably makes sense to use the same framework for the service and client end. Then again, if you are only a consumer of a service you have multiple choices among those shown above as well as the option of using Apache HTTP Client or Components directly and bypassing the higher level frameworks. For some, integrating with the spring framework is important and all the above frameworks have means of integration points, both on the service and client sides. Support for Client Proxies is something one might want to consider as they tend to simplify the programming model. Further if Resource definitions can be shared among client server, that can be quite useful in being DRY (Don't repeat yourself) and provide means for contract definition. For those interested in performance and tuning of the HTTP Connections, using a framework that allows you to manage connection pooling and other parameters is definitely the way to go.

For more information on Spring RestTemplate, visit Spring blog

Spring RestTemplate Example:

Friday, July 23, 2010

EJB 3.1

Today i'm going to talk about something interested and that is EJB3.1, after so many years in development sun has finally provided easy to use spec in EJB world in new enhanced EJB 3.1. I can see some similarity between EJB 3.1 spec and Spring Framework. lets check some feature from EJB 3.1

EJB 3.1 builds on the ease-of-use enhancements in EJB 3.0 by providing many new ways to improve developer productivity. Chief among these are the ability to implement session beans using only a bean class and the ability to package enterprise bean classes directly in a .war file, without the need for an ejb-jar file.

No business Interface (optional)
Singleton Bean
App callback
Annotation (exists in 3.0 spec)
No need of Eejb-Jar file

lets go and discuss above points.

No Interface

For example, the following session bean exposes a no-interface view:

public class HelloBean {

public String sayHello() {
String message = propertiesBean.getProperty("hello.message");
return message;

As is the case for a local view, the client of a no-interface view always acquires an EJB reference -- either through injection or JNDI lookup. The only difference is that the Java type of the EJB reference is the bean class type rather than the type of a local interface. This is shown in the following bean client:

private HelloBean helloBean;


String msg = helloBean.sayHello();
Note that even though there is no interface, the client cannot use the new() operator to explicitly instantiate the bean class. That's because all bean invocations are made through a special EJB reference, or proxy, provided by the container. This allows the container to provide all the additional bean services such as pooling, container-managed transactions, and concurrency management.


A singleton is a new kind of session bean that is guaranteed to be instantiated once for an application in a particular Java Virtual Machine (JVM)*. A singleton is defined using the @Singleton annotation, as shown in the following code example:

public class PropertiesBean {

private Properties props;
private int accessCount = 0;

public String getProperty(String name) { ... }

public int getAccessCount() { ... }

Because it's just another flavor of session bean, a singleton can define the same local and remote client views as stateless and stateful beans. Clients access singletons in the same way as they access stateless and stateful beans, that is, through an EJB reference. For example, a client can access the above PropertiesBean singleton as follows:

private PropertiesBean propsBean;


String msg = propsBean.getProperty("hello.message");
Here, the container ensures that all invocations to all PropertiesBean references in the same JVM are serviced by the same instance of the PropertiesBean. By default, the container enforces the same threading guarantee as for other component types. Specifically, no more than one invocation is allowed to access a particular bean instance at any one time. For singletons, that means blocking any concurrent invocations. However, this is just the default concurrency behavior. There are additional concurrency options that allow more efficient concurrent access to the singleton instance.

Application Call Back

Here is an example that shows part of a singleton that includes a @Startup annotation as well as @PostConstruct and @PreDestroy methods:

public class PropertiesBean {

private void startup() { ... }

private void shutdown() { ... }

No ejb-jar file

The EJB 3.1 specification addresses this problem by removing the restriction that enterprise bean classes must be packaged in an ejb-jar file. You now have the option of placing EJB classes directly in the .war file, using the same packaging guidelines that apply to web application classes. This means that you can place EJB classes under the WEB-INF/classes directory or in a .jar file within the WEB-INF/lib directory. The EJB deployment descriptor is also optional. If it's needed, you can package it as a WEB-INF/ejb-jar.xml file.


Sun offers a reference implementation for JAX-RS code-named Jersey. Jersey uses a HTTP web server called Grizzly, and the Servlet Grizzly Servlet (com.sun.jersey.spi.container.servlet.ServletContainer) handles the requests to Grizzly. You can develop production-quality JAX-RS applications today using Jersey, which implements all the APIs and provides all the necessary annotations for creating RESTful web services in Java quickly and easily. Beyond the set of annotations and features defined by JAX-RS, Jersey provides additional features through its own APIs, such as the Jersey Client API.

You can download Jersey separately or acquire it as a bundle with NetBeans 6.5 and GlassFish V3.

Developing RESTful Web Services Using JAX-RS
The classes and interfaces you use for creating RESTful web services with JAX-RS are available in the following packages:

I'm going to provide link to previous post for Jersey example and will write about Sprint rest template in my next post...

Jersey Example

Sunday, July 11, 2010

REST Framework implementation

I'm going to write a series of framework that can be used for REST implementation. I have looked different JAX RS vendors such as Jersey, RESTEasy and Restlet JAX-RS. but if you look at client side support, the JAX RS specification does not include a client side API and different JAX RS vendors have proceeded to create their own API's for the same. which means the framework one selects for the client and service ends do not have to be the same. For example, one could have a RESTful service running using Restlet while having a client developed in RESTEasy that consumes the same. In most cases, one will typically be satisfied with using the same client framework as one is using for the service development in order to be consistent and potentially be able to re-use artifacts in both areas if the framework permits the same. i have listed few options take a look at them.

Jersey - A JAX RS reference implementation from Oracle/Sun.
Spring 3.0 - Spring provides a RestTemplate, quite like the JmsTemplate and HibernateTemplate in its support of REST clients.
CXF - An apache project that is a merger of XFire and Celtix. Provides a JAX RS implementation and JAX WS as well.
RESTEasy - A JAX RS reference implementation by JBoss.
Restlet - A new upcoming framework that has support for Client Side REST calls
Apache Wink - A JAX RS implementation that has a client and server side api that has had a 1.0 release just recently
Apache HTTPClient or Http Components - Direct use of HTTP API - Down and dirty.

Im my next post i'll be talking about JERSEY and might present small example of that.

Thursday, July 1, 2010

Era of Open Source

Open Source framework, tools over the last few years have defined new era of technologies, from frameworks to infrastructure in my experience, the vast majority of enterprises are very comfortable with open source. Of course, my experience is naturally biased toward those companies who are comfortable paying for services and support around open source, and especially those who are replacing non-functional proprietary software with open source software.
I know 5-10 years ago a lot of the companies using open source would have been very hesitant to rely so heavily on open source, but so many people do now that the market seems to have really changed.

I do think going forward will see lots of companies will move toward using open source, you already have big companies Google, Facebook and Amazon using open source like Hadoop, Puppet etc...I just red article that Adobe has released Puppet Recipes for source community keep growing day by day...

Thursday, June 24, 2010

Preview of Maven 3 feature's

Maven 3 is promising to be the most significant upgrade since the release of Maven 2. While maintaining backward compatibility with existing Maven 2 projects, it introduces a number of powerful and compelling new features, such as a complete rewrite of the internal architecture, OSGi support and multi-language pom files.

One exciting new feature in Maven 3 is it's ability to work with pom files written in non-XML notations. The Maven core now provides an underlying DSL to access the Maven internals, and write POM files in the language of your choice. This currently includes scripting languages like Groovy, Ruby, and others.

With Maven 3, you can use a Groovy DSL that maps directly to the XML pom format. So, instead of


you could write:

dependencies {
dependency { groupId 'junit'; artifactId 'junit'; version '4.7'; scope 'test' }

If you're familiar with the XML pom files, this will read pretty easily - it's essentially an XML pom file without the noise generated by the XML tags. Although it's an obvious improvement, some of the transcribed Groovy DSL code might still seem a bit wordy to some. For example, a set of project dependencies might look like this:

dependencies {
dependency {
groupId 'junit'
artifactId 'junit'
version '4.7'
scope 'test'
dependency {
groupId 'org.hamcrest'
artifactId 'hamcrest-all'
version '1.1'
dependency {
groupId 'log4j'
artifactId 'log4j'
version '1.2.12'
However, you can make this more concise simply by using semi-colons to separate the dependency elements:

dependencies {
dependency { groupId 'junit'; artifactId 'junit'; version '4.7'; scope 'test' }
dependency { groupId 'org.hamcrest'; artifactId 'hamcrest-all'; version '1.1' }
dependency { groupId 'log4j'; artifactId 'log4j'; version '1.2.12' }

This is certainly more concise and more readable, and goes with the general tendancy of moving away from XML as a build scripting language in favour of more lightweight notations. But the real power of this is that it is effectively an interface to the Maven 3 core, that gives you full access to all of the Maven features. The Maven 3 core is rock solid, and you can leverage all the existing features and plugins from the Maven 2 ecosphere.

Maven 3 is fully backward-compatible with your existing Maven 2 projects. Here's some good sites to get more information about Maven 3.

Wednesday, June 2, 2010

EJB3 & EJB 3.1 and Spring Framework

In this article i;m trying to compare what EJB 3.1 and spring framework have in common, if at all? and whether Sun did a mistake by releasing EJB 3.0 spec? So the real question is if application are using EJb 2.1, should they migrate to EJb 3 or EJB 3.1? Or remove ejb's and use Spring framework to do everything...

well, It turned out, that both components models are surprisingly similar. You could migrate an EJB 3.1 based application, almost without any additional effort to Spring (search and replace for annotations). It is even possible to run an EJB 3.1 applications without ANY modification just tweaking Spring a bit.

Although both technologies are almost identical from the programming model perspective - the philosophy is totally different. Spring "is not just a framework", rather than complete solution - the full stack. Spring was architected as a layer above the actual application server. The idea: you can upgrade your APIs updating Spring and not touching the application server. The DI model is just a tiny part of spring framework, and now REST support in Spring MVC and so n so forth..

The philosophy of EJB 3.1 is exactly the opposite. It is not a complete solution, rather than "only" a component model for transactional, serverside applications. It comes with a set of suitable conventions, so you don't have to configure anything an rely on the existing conventions.Neither annotations (except @Stateless), nor XML-configuration is needed. The EJB infrastructure has to be available at the application server - so you only have to deploy your application - without the EJB-"framework" (Glassfish EJB 3 container is about 700kB) bits. The DI are not as sophisticated as Spring's, JSR-299 or JSR-330

Monday, May 24, 2010

Dependency Injection Pattern

Dependency pattern is best described in this article by Martin Fowler, and Dependency Injection is such a cross cutting concern throughout the Java ecosystem - its also a very well understood problem space with a small number of popular implementations. I can count them on fingers, Spring, Guice and JSR 299. And in popularity I think Spring has an advantage as spring does provides other modules which makes spring an easy candidate to pick for DI and other development. Guice was the first one to provide annotation in framework. Now, spring provides you various ways to wire things together like xml, annotation and java config.

here's good comparison between Guice & Spring....I found it very interesting..

Saturday, May 15, 2010

Is Spring MVC is best web Framework?

Picking the right web framework is probably a developers nightmare (Which one to pick? Pick the wrong one and we might end up using a duff dead framework that few developers know etc?). But it has lead to a ton of innovation in the web framework space. On balance I think competition and innovation are good things.

Spring provide support for REST in Spring MVC, Spring provides three JAX-RS implementations support (Jersey, RESTEasy, and Restlet). Spring MVC support URI, as REST principle every request will get URI, and better RESTful implementation will have link between URI so user can navigate between URI and make better use of architecture. Spring also provide support for content negotiations as view resolver does a pretty good job in doing this, it also support JSON, it does support other REST principles such as HTTP(Get, PUT, POST, Delete) used in RESSful architecture. I think its surely stand as one of the best web framework.

Wednesday, May 5, 2010

JRebel: Build apps in less time

If you write a simple app, think about how much time goes in to deployment of your app and test it and if you make any changes, you have to repeat this cycle again n again. This can be very frustrating and painful, here's one tool I found, can help reduce developer time significantly. its called JRebel.

try it to believe it...enjoy.

Friday, April 30, 2010

JAX-RS and Jersey

To start with let me give you an one line overview of Jersey, Jersey is the open source and JAX-RS (JSR 311) Reference Implementation for building RESTful Web services.

JAX-RS came along initially as a way of writing RESTful services on the Java platform; using annotations and loose coupling to bind resource beans and their public methods to URIs, HTTP methods and MIME content type negotiation. I've said before I think its awesome, one of the most impressive JSRs we've had.

Root Resource:

Here's simple POJOs (Plain Old Java Objects) that are annotated with @Path have at least one method annotated with @Path or a resource method designator annotation such as @GET, @PUT, @POST, or @DELETE. Resource methods are methods of a resource class annotated with a resource method designator.

The following code example of a root resource class using JAX-RS annotations Java Objects to create RESTful Web Services. The example code shown here is from one of the samples that ships with Jersey, the zip file of which can be found in the maven repository here.



// The Java class will be hosted at the URI path "/helloworld"
public class HelloWorldResource {
// The Java method will process HTTP GET requests
// The Java method will produce content identified by the MIME Media
// type "text/plain"
public String getClichedMessage() {
// Return some cliched textual content
return "Hello World";

Relative URI: @Path

The @Path annotation's value is a relative URI path. In the example above, the Java class will be hosted at the URI path /helloworld. This is an extremely simple use of the @Path annotation. What makes JAX-RS so useful is that you can embed variables in the URIs.

URI path templates are URIs with variables embedded within the URI syntax. These variables are substituted at runtime in order for a resource to respond to a request based on the substituted URI. Variables are denoted by curly braces. For example, look at the following @Path annotation:

Here a user will be prompted to enter their name, and then a Jersey web service configured to respond to requests to this URI path template will respond. For example, if the user entered their username as "Vikas", the web service will respond to the following URL:

Better Content Negotiation Support : @producer & @Consumes

The @Produces annotation is used to specify the MIME media types of representations a resource can produce and send back to the client. In this example, the Java method will produce representations identified by the MIME media type "text/plain".
you might want to prefer to return HTML over XML/JSON so unless folks ask specifically just for XML or JSON you return HTML.

Http Methods: @GET, @PUT, @POST, @DELETE and @HEAD

These are resource method designator annotations defined by JAX-RS and which correspond to the similarly named HTTP methods.

Sunday, April 18, 2010

Working with Spring Roo -2

Just finished creating project in spring roo with JPA provider, Hibernate. what's cool about it, roo take care of everything and create spring project for you, including all source classes and spring appicationContext.xml and all test classes, and creates POM.xml for you to use for build and test far able to run project using selenium and tomcat as server.

here are some commands:

project --topLevelPackage com.vikasroo (creates project structure, pom.xml and spring app context)
persistence setup --provider HIBERNATE --database (creates JPA project with Hibernate as JPA provider)
entity --name ~.vikas.Kuma (creates Entity for JPA)
test integration (will verify command JPA operations in test)

its all about TAB in spring Roo...try it out. its good to have some prior Maven knowledge to use this.

Monday, April 12, 2010

Amazon Announced Simple Notification Service

Amazon recently announced "Simple Notification Service", here's basic information about SNS:

Amazon Simple Notification Service (Amazon SNS) is a web service that makes it easy to set up, operate, and send notifications from the cloud. It provides developers with a highly scalable, flexible, and cost-effective capability to publish messages from an application and immediately deliver them to subscribers or other applications. It is designed to make web-scale computing easier for developers.

Amazon SNS provides a simple web services interface that can be used to create topics you want to notify applications (or people) about, subscribe clients to these topics, publish messages, and have these messages delivered over clients’ protocol of choice (i.e. HTTP, email, etc.). Amazon SNS delivers notifications to clients using a “push” mechanism that eliminates the need to periodically check or “poll” for new information and updates. Amazon SNS can be leveraged to build highly reliable, event-driven workflows and messaging applications without the need for complex middleware and application management. The potential uses for Amazon SNS include monitoring applications, workflow systems, time-sensitive information updates, mobile applications, and many others. As with all Amazon Web Services, there are no up-front investments required, and you pay only for the resources you use.

Saturday, April 10, 2010

Working with Spring Roo -1

Today I start working on my new task from my own TODO list. To work on Spring Roo, I have already installed Spring Roo, when Spring announced Roo with Spring 3.0. Next few days i'll be working on small test app with Spring Roo.

Here's little background of Spring Roo, its targeted to make coder life easier by creating project in few steps. it worked from command line and can guide you, if you don't know what to do next? Spring Roo usage AspectJ behind the scene to inspect and create code for you.

Will update this post once I'm done with Spring Roo test application.

You can download Spring Roo from here.

Tuesday, April 6, 2010

Installation Wiki

While installing JBoss Tools 3.1 on Eclipse 3.5.1 and creating a update site, you can run into problem with network proxy (depending how's security set up for your web server), as it will ask for user & password to access internet to get in to JBoss update site (even if its local site). here's one work around of this problem, add this line in to your Eclipse.ini file (which is in under your eclipse folder) and restart eclipse. This should resolve proxy problem.

Set the following system property in you eclipse.ini file: -


Also, while doing some research on this topic, came across this site, which can be very useful for enterprise. So that you don't end up writing bunch of installation guide.

Saturday, April 3, 2010

MySQL on Mac

Today I'm working with installing MySQL on my MacPro, I found these useful site for installing MySQL on mac...

Mac OS version 10.2 and higher

1- Download the package mentioned above to your desktop. Unpack it and then double-click on the .pkg file to install it.
2- Open a terminal window and type in the following commands (without the double quotes):
3- type cd /usr/local/mysql
4- type sudo chown -R mysql data/, enter your Mac OS X account password when asked for it.
5- To start the server, issue sudo echo first, then type sudo ./bin/mysqld_safe &
6- Use it with /usr/local/mysql/bin/mysql test

Useful Links:

One thing I would like to mention here for bash shell, which is the default for new user accounts created under Mac OS X 10.3, the command is:

echo 'export PATH=/usr/local/mysql/bin:$PATH' >> ~/.bash_profile

Hope this will help.

Tuesday, March 30, 2010

Open Source Development Tools!!!

Here's are some of the Open Source Development tool choices that includes IDE, Repository, Issue/Tracking Management and open source server runtime.

Eclipse, NetBeans, IntelliJ
Hudson, CruiseControl

I have used most of these tools, and liked most of these...

try it out..

Saturday, March 27, 2010

What are RESTful Web Services?

A paper that expands on the basic principles of REST technology can be found at:

The REST architectural style is based on four principles:

Resource identification through URI.

A RESTful Web service exposes a set of resources which identify the targets of the interac- tion with its clients. Resources are identified by URIs [5], which provide a global addressing space for resource and service discov- ery.

Uniform interface.

Resources are manipulated using a fixed set of four create, read, update, delete operations: PUT, GET, POST, and DELETE. PUT creates a new resource, which can be then de- leted using DELETE. GET retrieves the current state of a resource in some representation. POST transfers a new state onto a resource.

Self-descriptive messages.

Resources are decoupled from their representation so that their content can be accessed in a variety of formats (e.g., HTML, XML, plain text, PDF, JPEG, etc.). Meta- data about the resource is available and used, for example, to con- trol caching, detect transmission errors, negotiate the appropriate representation format, and perform authentication or access con- trol.

Stateful interactions through hyperlinks.

Every interaction with a resource is stateless, i.e., request messages are self-contained. Stateful interactions are based on the concept of explicit state trans- fer. Several techniques exist to exchange state, e.g., URI rewriting, cookies, and hidden form fields. State can be embedded in response messages to point to valid future states of the interaction.

Amazon Cloud Computing

Cloud computing is becoming a very hot area as it provides cost savings and time-to-market benefits to a wide spectrum of organizations.

At the consumer end, small startup companies found Cloud computing can significantly reduce their initial setup cost. Large enterprises also found Cloud computing allows them to improve resource utilization and cost effectiveness, although they also have security and control concerns. Here is a very common cloud deployment model across many large enterprises.

Traditional software companies who distributes software on CD also look into the SaaS model as a new way of doing business. However, a SaaS model typically requires the companies to build some kind of web site. But these companies may not have the expertise to build large scale web sites and operate it. Cloud computing also allows them to outsource the SaaS infrastructure.

Here we look at the leader in the cloud computing provider space. AWS from Amazon.

Amazon Web Service

Amazon is the current leading provider in the Cloud computing space. At the heart of its technology stack (which is known as the Amazon Web Services), it includes an IaaS stack, a PaaS stack and a SaaS stack.
Their IaaS stack includes infrastructure resource such as virtual machine, virtual mount disks, virtual network, load balancer, VPN, Databases.
Their PaaS stack provides a set of distributed computing services including queuing, data storage, metadata, parallel batch processing,
Their SaaS stack provides a set of high level services such as content delivery network, payment processing services, ecommerce fulfillment services.
Since we are focusing in the Cloud Computing aspects, we will describe their IaaS and PaaS stack below but will skip their SaaS stack.

EC2 – Elastic Computing

Amazon has procured a large number of commoditized Intel boxes running virtualization software Xen. On top of Xen, Linux or Windows can be run as the guest OS . The guest operating system can have many variations with different set of software packages installed.

Each configuration is bundled as a custom machine image (called AMI). Amazon host a catalog of AMI for the users to choose from. Some AMI is free while other requires a usage charge. User can also customize their own setup by starting from a standard AMI, make their special configuration changes and then create a specific AMI that is customized for their specific needs. The AMIs are stored in Amazon’s storage subsystem S3.

Amazon also classifies their machines in terms of their processor power (no of cores, memory and disk size) and charged their usage at a different rate. These machines can be run in different network topology specified by the users. There is an “availability zone” concept which is basically a logical data center. “Availability zone” has no interdependency and is therefore very unlikely to fail at the same time. To achieve high availability, users should consider putting their EC2 instances in different availability zones.

“Security Group” is the virtual firewall of Amazon EC2 environment. EC2 instances can be grouped under “security group” which specifies which port is open to which incoming range of IP addresses. So EC2 instances that running applications at various level of security requirements can be put into appropriated security groups and managed using ACL (access control list). Somewhat very similar to what network administrator configure their firewalls.

User can start the virtual machine (called an EC2 instance) by specifying the AMI, the machine size, the security group, and its authentication key via command line or an HTTP/XML message. So it is very easy to startup the virtual machine and start running the user’s application. When the application completes, the user can also shutdown the EC2 instance via command line or HTTP/XML message. The user is only charged for the actual time when the EC2 instance is running.

One of the issue of extremely dynamic machine configuration (such as EC2) is that a lot of configuration setting is transient and does not survive across reboot. For example, the node name and IP address may have been changed, all the data stored in local files is lost. Latency and network bandwidth between machines may also have changed. Fortunately, Amazon provides a number of ways to mitigate these issues.
By paying some charge, user can reserve a stable IP address, called “elastic IP”, which can be attached to EC2 instance after they bootup. External facing machine is typically done this way.
To deal with data persistence, Amazon also provides a logical network disk, called “elastic block storage” to store the data. By paying some charges, EBS is reserved for the user and it survives across EC2 reboots. User can attach the EBS to EC2 instances after the reboot.

EBS – Elastic Block Storage

Based on RAID disks, EBS provides a persistent block storage device for data persistence where user can attach it to a running EC2 instance within the same availability zone. EBS is typically used as a file system that is mounted to EC2 instance, or as raw devices for database.

Although EBS is a network devices to the EC2 instance, benchmark from Amazon shows that it has higher performance than local disk access. Unlike S3 which is based on eventual consistent model, EBS provides strict consistency where latest updates are immediately available.

CloudWatch -- Monitoring Services

CloudWatch provides an API to extract system level metrics for each VM (e.g. CPU, network I/O and disk I/O) as well as for each load balancer services (e.g. response time, request rate). The collected metrics is modeled as a multi-dimensional data cube and therefore can be queried and aggregated (e.g. min/max/avg/sum/count) in different dimensions, such as by time, or by machine groups (by ami, by machine class, by particular machine instance id, by auto-scaling group).

This metrics is also used to drive the auto-scaling services (described below). Note that the metrics are predefined by Amazon and custom metrics (application level metrics) is not supported at this moment.

Load Balancing Services

Load balancer provides a way to group identical VMs into a pool. Amazon provides a way to create a software load balancer in a region and then attach EC2 instances (of the same region) to the it. The EC2 instances under a particular load balancer can be in different availability zone but they have to be in the same region.

Auto-Scaling Services

Auto-scaling allows the user to group a number of EC2 instances (typically behind the same load balancer) and specify a set of triggers to grow and shrink the group. Trigger defines the condition which is matching the collected metrics from the CloudWatch and match that against some threshold values. When match, the associated action can be to grow or shrink the group.

Auto-scaling allows resource capacity (number of EC2 instances) automatically adjusted to the actual workload. This way user can automatically spawn more VMs as the workload increases and shutdown the VM as the load decreases.

Relational DB Services

RDS is basically running MySQL in the EC2.

S3 – Simple Storage Service

Amazon S3 provides a HTTP/XML services to save and retrieve content. It provides a file system-like metaphor where “objects” are group under “buckets”. Based on a REST design, each object and bucket has its own URL.

With HTTP verbs (PUT, GET, DELETE, POST), user can create a bucket, list all the objects within the bucket, create object within a bucket, retrieve an object, remove an object, remove a bucket … etc.

Under S3, each object has a unique URI which serves as its key. There is no query mechanism in S3 and User has to lookup the object by its key. Each object is stored as an opaque byte array with maximum 5GB size. S3 also provides an interesting partial object retrieval mechanism by specifying the ranges of bytes in the URL.

However, partial put is not current support but it can be simulated by breaking the large object into multiple small objects and then do the assembly at the app level. Breaking down the object also help to speed up the upload and download by doing the data transfer in parallel.

Within Amazon S3, each S3 objects are replicated across 2 (or more) data center and also cache at the edge for fast retrieval.

Amazon S3 is based on an “eventual consistent” model which means it is possible that an application won’t see the change it just made. Therefore, some degree of tolerance of inconsistent view is required by the application. Application should avoid the situation of having two concurrent modifications to the same object. And application should wait for some time between updates, and also should expect all the data it reads is potentially stale for few seconds.

There is also no versioning concept in S3, but it is not hard to build one on top of S3.

SimpleDB – queriable data storage

Unlike S3 where data has to be looked up by key, SimpleDB provides a semi-structured data store with querying capability. Each object can be stored as a number of attributes where the user can search the object by the attribute name.

Similar to the concepts of “buckets “ and “objects” in S3, SimpleDB is organized as a set of “items” grouped by “domains”. However, each item can have a number of “attributes” (up to 256). Each attribute can store one or multiple values and the value must be a string (or a string array in case of multi-valued attribute). Each attribute can store up to 1K bytes, so it is not appropriate to store binary content.

SimpleDB is typically used as a metadata store in conjuction with S3 where the actual data is being stored. SimpleDB is also schema-less. Each item can define its own set of attributes and is free to add more or remove some attributes at runtime.

SimpleDB provides a query capability which is quite different from SQL. The “where” clause can only match an attribute value with a constant but not with other attributes. On the other hand, the query result only return the name of the matched items but not the attributes, which means subsequent lookup by item name is needed. Also, there is no equivalent of “order by” and the returned query result is unsorted.

Since all attribute are store as strings (even number, dates … etc). All comparison operation is done based on lexical order. Therefore, special encoding is needed for data type such as date, number to string to make sure comparison operation is done correctly.

SimpleDB is also based on an eventual consistency model like S3.

SQS – Simple Queue Service

Amazon provides a queue services for application to communicate in an asynchronous way with each other. Message (up to 256KB size) can be sent to queues. Each queue is replicated across multiple data centers.

Enterprises use HTTP protocol to send messages to a queue. “At least once” semantics is provided, which means, when the sender get back a 200 OK response, SQS guarantees that the message will be received by at least one receiver.

Receiving messages from a queue is done by polling rather than event driven calling interface. Since messages are replicated across queues asynchronously, it is possible that receivers only get some (but not all) messages sent to the queue. But the receiver keep polling the queue, he will eventually get all messages sent to the queue. On the other hand, message can be delivered out of order or delivered more than once. So the message processing logic needs to be idempotent as well as independent of message arrival order.

Once message is taken by a receiver, the message is invisible to other receivers for a period of time but it is not gone yet. The original receiver is supposed to process the message and make an explicit call to remove the message permanently from the queue. If such “removal” request is not made within the timeout period, the message will be visible in the queue again and will be picked up by subsequent receivers.

Elastic Map/Reduce

Amazon provides an easy way to run Hadoop Map/Reduce in the EC2 environment. They provide a web UI interface to start/stop a Hadoop Cluster and submit jobs to it. For a detail of how Hadoop works, see here.

Under elastic MR, both input and output data are stored into S3 rather than HDFS. This means data need to be loaded to S3 before the Hadoop processing can be started. Elastic also provides a job flow definition so user can concatenate multiple Map/Reduce job together. Elastic MR supports the program to be written in Java (jar) or any programming language (Hadoop streaming) as well as PIG and Hive.

Virtual Private Cloud

VPC is a VPN solution such that the user can extend its data center to include EC2 instances running in the Amazon cloud. Notice that this is an "elastic data center" because its size can grow and shrink when the user starts / stops EC2 instances.

User can create a VPC object which represents an isolated virtual network in the Amazon cloud environment and user can create multiple virtual subnets under a VPC. When starting the EC2 instance, the subnet id need to be specified so that the EC2 instance will be put into the subnet under the corresponding VPC.

EC2 instances under the VPC is completely isolated from the rest of Amazon's infrastructure at the network packet routing level (of course it is software-implemented isolation). Then a pair of gateway objects (VPN Gateway on the Amazon side and Customer gateway on the data center side) need to be created. Finally a connection object is created that binds these 2 gateway objects together and then attached to the VPC object.

After these steps, the two gateway will do the appropriate routing between your data center and the Amazon VPC with VPN technologies used underneath to protect the network traffic.

Things to watch out

I personally think Amazon provides a very complete set of services that is sufficient for a wide spectrum of deployment scenarios. Nevertheless, there are a number of limitations that needs to pay attention to …
There are no Cloud standards today. Whatever choice made for a provider will imply some degree of lock-in to a vendor specific architecture. Amazon is no exception. One way to minimize such lock-in is to introduce an insulation layer to localize all the provider-specific API.
Cloud providers typically run their infrastructure on low-cost commodity hardware inside some data center with network connected between them. Amazon is not making their hosting environment very transparently and so it is not very clear how much reliability one can expect from their environment. On the other hand, the SLA guarantee that Amazon is willing to provide is relatively low.
Multicast communication is not supported between EC2 instances. This means application has to communicate using TCP point-to-point protocol. Some cluster replication framework based on IP multicast simply doesn’t work in EC2 environment.
EBS currently cannot be attached to a multiple EC2 instance at the same time. This means some application (e.g. Oracle cluster) which based on having multiple machines accessing a shared disk simply won’t work in EC2 environment.