Return Only A Portion Of A Line After Matching A Pattern

Lets say a file (fileName) contains the following data

abcdddddfffbbbbbdddStartPatternSomeData1EndPattern
abcdddddfffdfdfsdsfStartPatternSomeData2EndPattern
fsdfdsddfdffbbbbbdddStartPatternSomeData3EndPattern
abcdddfsdfsdfdsbdddStartPatternSomeData4EndPattern

and you are interested in

SomeData1

SomeData2

SomeData3

SomeData4

Following command Would help do that

cat fileName| sed -n -e  ‘s/^.*StartPattern//p’ | sed -n -e ‘s/EndPattern.*$//p’

Here :

  • -n means not to print anything by default.
  • -e is followed by a sed command.
  • s is the pattern replacement command.
  • The final p means to print the transformed line.

OCR With Tess4j

Tess4j is a JNA-based wrapper for Tesseract OCR DLL, the library provides optical character recognition (OCR) support for:

  • TIFF, JPEG, GIF, PNG, and BMP image formats
  • Multi-page TIFF images
  • PDF document format

How To Run The Sample

Step 1 :Download the maven  project from here

Step 2 : Run the Example

Add VM Argument

64 bit

-Djna.library.path=${workspace_loc:/ocr-tess4j-example}/dlls/x64

32 bit

-Djna.library.path=${workspace_loc:/ocr-tess4j-example}/dlls/x86

ocr6

  Step 5  : Output

ocr4 ocr5

Hadoop Up and running

Running Hadoop in Local (Standalone) mode

Environment Setup

If you don’t want to have a dedicated hardware for quick setups and researches, Virtual Machines would be the right choice.

Setup Virtual Machine

Step 1: Download Virtual Machine Player

You can download VM Player for free from here

Go ahead and install on your host OS.

Step 2: Download Virtual Machine

You can download legal CentOs6 image from here

You can even download Cloudera Quick Start VM, tailored for big data needs.

After downloading, extract it to a suitable place in your host OS.

Step 3: Start Virtual Machine

Click on “Open Virtual Machine”
hd1
Select the one you have already extracted

hd2

Click on “Play virtual Machine”

hd3

You may have to help VM Player, when prompted hit “I copied it”

hd4

Login with password “tomtom”

hd5

Successful login would show the desktop screen

hd6

Setup Java

Step 1: Make sure CentOS system is fully up-to-date

Become super user and run yum update

[tom@localhost ~]$ su
Password:
[root@localhost tom]# yum update

Step 2: Install JDK

[root@localhost tom]# yum search java | grep -i --color 'JDK'

hd7

In windows it is one click install process

[root@localhost tom]# yum install java-1.7.0-openjdk-devel.x86_64
[root@localhost tom]# javac -version
javac 1.7.0_55

In CentOS6, JDK gets installed under the following folder

[root@localhost java-1.7.0-openjdk-1.7.0.55.x86_64]# ll /usr/lib/jvm
total 4
lrwxrwxrwx. 1 root root 26 Jun 20 10:29 java -> /etc/alternatives/java_sdk
lrwxrwxrwx. 1 root root 32 Jun 20 10:29 java-1.7.0 -> /etc/alternatives/java_sdk_1.7.0
drwxr-xr-x. 7 root root 4096 Jun 20 10:29 java-1.7.0-openjdk-1.7.0.55.x86_64
lrwxrwxrwx. 1 root root 34 Jun 20 10:29 java-1.7.0-openjdk.x86_64 -> java-1.7.0-openjdk-1.7.0.55.x86_64
lrwxrwxrwx. 1 root root 34 Jun 20 10:29 java-openjdk -> /etc/alternatives/java_sdk_openjdk
lrwxrwxrwx. 1 root root 21 Jun 20 10:29 jre -> /etc/alternatives/jre
lrwxrwxrwx. 1 root root 27 Jun 20 10:29 jre-1.7.0 -> /etc/alternatives/jre_1.7.0
lrwxrwxrwx. 1 root root 38 Jun 20 10:29 jre-1.7.0-openjdk.x86_64 -> java-1.7.0-openjdk-1.7.0.55.x86_64/jre
lrwxrwxrwx. 1 root root 29 Jun 20 10:29 jre-openjdk -> /etc/alternatives/jre_openjdk
<pre>

Setup Hadoop

Step 3: Create new user to run hadoop

[root@localhost ~]# useradd hadoop
[root@localhost ~]# passwd hadoop
Changing password for user hadoop.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@localhost ~]#

Step 4: Download Hadoop

[tom@localhost ~]$ su hadoop
Password:
[hadoop@localhost ~]$
[hadoop@localhost Downloads]$ wget 'http://apache.arvixe.com/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz'
[hadoop@localhost Downloads]$ tar -xvf hadoop-2.2.0.tar.gz

For now we can place hadoop under /usr/share/hadoop folder

</pre>
[root@localhost Downloads]# mkdir /usr/share/hadoop
[root@localhost Downloads]# mv hadoop-2.2.0 /usr/share/hadoop/2.2.0
<pre>

Step 5: Change Ownership

[root@localhost Downloads]# cd /usr/share/hadoop/2.2.0/
[root@localhost share]# chown -R hadoop hadoop
[root@localhost share]# chgrp -R hadoop hadoop

Step 6: Update Environment Variables

[hadoop@localhost ~]$ vi ~/.bash_profile
JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64
HADOOP_HOME=/usr/share/hadoop/2.2.0

export JAVA_HOME
export HADOOP_HOME
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export PATH

Things are all easy in Windows
hd11 hd12

[hadoop@localhost Downloads]$ source ~/.bash_profile

[hadoop@localhost ~]$ vi /usr/share/hadoop/2.2.0/etc/hadoop/hadoop-env.sh
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

Step 7: Test Hadoop

[hadoop@localhost Downloads]$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/share/hadoop/2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar

Step 8: Fix SSH Issues

If you are getting ssh related issues while starting dfs (name node,data node or yarn), it could be that ssh is not installed or running.

Issue#1:

<h4></h4>
<h4>[hadoop@localhost ~]$ start-dfs.sh</h4>
VM: ssh: Could not resolve hostname VM: Name or service not known
Issue#2:
<h5>[hadoop@localhost Downloads]$ ssh localhost</h5>
ssh: connect to host localhost port 22: Connection refused

Solution

Install SSH server/client

[root@localhost Downloads]$ yum -y install openssh-server openssh-clients

Enable SSH

[root@localhost Downloads] chkconfig sshd on
[root@localhost Downloads] service sshd start
Generating SSH1 RSA host key: [ OK ]
Generating SSH2 RSA host key: [ OK ]
Generating SSH2 DSA host key: [ OK ]
Starting sshd: [ OK ]
[root@localhost Downloads]#
Make sure port 22 is open
[root@localhost Downloads]# netstat -tulpn | grep :22
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 8662/sshd
tcp 0 0 :::22 :::* LISTEN 8662/sshd

Create empty phrase ssh keys so that you don’t have to enter password manually, while hadoop works

[hadoop@localhost ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /home/hadoop/.ssh/id_dsa.
Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub.
The key fingerprint is:
08:d6:c1:66:2c:c8:c5:7b:96:d8:cb:fc:8d:19:16:38 hadoop@localhost.localdomain
The key's randomart image is:
+--[ DSA 1024]----+
| . +.o. |
| o o.=. |
| oB.o |
| .o.E.. |
| =.oS. |
| + o |
| o = |
| + . |
| |
+-----------------+
[hadoop@localhost ~]$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
[hadoop@localhost ~]$ chmod 644 ~/.ssh/authorized_keys
[hadoop@localhost ~]$

Verify that doing SSH does not prompt for password

[hadoop@localhost ~]$ ssh localhost
Last login: Fri Jun 20 15:08:09 2014 from localhost.localdomain
[hadoop@localhost ~]$

Start all Components

[hadoop@localhost Desktop]$ start-all.sh
[hadoop@localhost Desktop]$ jps
3542 NodeManager
3447 ResourceManager
3576 Jps

Hadoop Cluster

hd8

HDFS

hd9

 

Running Hadoop in Pseudo-Distributed mode

Step 1: Update core-site.xml

Add the following to core-site.xml

[hadoop@localhost ~]$ vi $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<final>true</final>
</property>
</configuration>

Step 2: Update hdfs-site.xml

Add the following to hdfs-site.xml

[hadoop@localhost ~]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/dfs/name</value>
<final>true</final>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/dfs/data</value>
<final>true</final>
</property>

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
<pre>

Step 3: Update mapred-site.xml

Add the following to mapred-site.xml

</pre>
[hadoop@localhost ~]$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>file:/home/hadoop/mapred/system</value>
<final>true</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>file:/home/hadoop/mapred/local</value>
<final>true</final>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>

Step 3: Update yarn-site.xml

Add the following to yarn-site.xml

[hadoop@localhost ~]$ vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

Step 4: Name node format

[hadoop@localhost ~]$ hdfs namenode -format
14/06/20 21:19:34 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localhost.localdomain/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.2.0
STARTUP_MSG: classpath = /usr/share/hadoop/2.2.0/etc/hadoop:/usr/share/hadoop/2.2.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/share/hadoop/2.2.0/share/hadoop/common/lib/hadoop-auth-2.2.0.jar *****
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common -r 1529768; compiled by 'hortonmu' on 2013-10-07T06:28Z
STARTUP_MSG: java = 1.7.0_55
************************************************************/
14/06/20 21:19:34 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
14/06/20 21:19:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-d9351768-346c-43e4-b0f2-5a182a12ec4a
14/06/20 21:19:36 INFO namenode.HostFileManager: read includes:
HostSet(
)
14/06/20 21:19:36 INFO namenode.HostFileManager: read excludes:
HostSet(
)
14/06/20 21:19:36 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
14/06/20 21:19:36 INFO util.GSet: Computing capacity for map BlocksMap
14/06/20 21:19:36 INFO util.GSet: VM type = 64-bit
14/06/20 21:19:36 INFO util.GSet: 2.0% max memory = 966.7 MB
14/06/20 21:19:36 INFO util.GSet: capacity = 2^21 = 2097152 entries
14/06/20 21:19:36 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
14/06/20 21:19:36 INFO blockmanagement.BlockManager: defaultReplication = 3
14/06/20 21:19:36 INFO blockmanagement.BlockManager: maxReplication = 512
14/06/20 21:19:36 INFO blockmanagement.BlockManager: minReplication = 1
14/06/20 21:19:36 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
14/06/20 21:19:36 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
14/06/20 21:19:36 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
14/06/20 21:19:36 INFO blockmanagement.BlockManager: encryptDataTransfer = false
14/06/20 21:19:36 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
14/06/20 21:19:36 INFO namenode.FSNamesystem: supergroup = supergroup
14/06/20 21:19:36 INFO namenode.FSNamesystem: isPermissionEnabled = false
14/06/20 21:19:36 INFO namenode.FSNamesystem: HA Enabled: false
14/06/20 21:19:36 INFO namenode.FSNamesystem: Append Enabled: true
14/06/20 21:19:36 INFO util.GSet: Computing capacity for map INodeMap
14/06/20 21:19:36 INFO util.GSet: VM type = 64-bit
14/06/20 21:19:36 INFO util.GSet: 1.0% max memory = 966.7 MB
14/06/20 21:19:36 INFO util.GSet: capacity = 2^20 = 1048576 entries
14/06/20 21:19:36 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/06/20 21:19:36 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
14/06/20 21:19:36 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
14/06/20 21:19:36 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
14/06/20 21:19:36 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
14/06/20 21:19:36 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
14/06/20 21:19:36 INFO util.GSet: Computing capacity for map Namenode Retry Cache
14/06/20 21:19:36 INFO util.GSet: VM type = 64-bit
14/06/20 21:19:36 INFO util.GSet: 0.029999999329447746% max memory = 966.7 MB
14/06/20 21:19:36 INFO util.GSet: capacity = 2^15 = 32768 entries
14/06/20 21:19:37 INFO common.Storage: Storage directory /home/hadoop/dfs/name has been successfully formatted.
14/06/20 21:19:37 INFO namenode.FSImage: Saving image file /home/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
14/06/20 21:19:37 INFO namenode.FSImage: Image file /home/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 198 bytes saved in 0 seconds.
14/06/20 21:19:37 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
14/06/20 21:19:37 INFO util.ExitUtil: Exiting with status 0
14/06/20 21:19:37 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
************************************************************/
[hadoop@localhost ~]$
Step 5: Start Hadoop components
[hadoop@localhost ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
14/06/20 21:22:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/share/hadoop/2.2.0/logs/hadoop-hadoop-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /usr/share/hadoop/2.2.0/logs/hadoop-hadoop-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/share/hadoop/2.2.0/logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out
14/06/20 21:23:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
resourcemanager running as process 3447. Stop it first.
localhost: nodemanager running as process 3542. Stop it first.
[hadoop@localhost ~]$ jps
3542 NodeManager
5113 DataNode
5481 Jps
5286 SecondaryNameNode
5016 NameNode
3447 ResourceManager

Step 5: Stop Hadoop components

[hadoop@localhost ~]$ stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
14/06/20 21:27:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
14/06/20 21:28:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop

Windows Issues

Issue #1: JAVA_HOME related issue

Solution : set it differently
hd10

Issue #2: Failed to locate winutils binary

Solution : Download this and copy hadoop-common-2.2.0/bin folder and paste them to HADOOP_HOME/bin folder. Alternatively you can build hadoop for your own environment as described here

 

Issue #3: “The system cannot find the batch label specified

Solution :Download unix2dos from here and run on all the hadoop batch files.

Very Basic Object Oriented Concepts

Abstraction

Abstraction is a model of a complex system that includes only the details essential to the perspective of the viewer of the system. Abstractions are the fundamental way that we manage complexity. Different viewers use different abstractions of a particular system. Thus, while we see a car as a means of transportation, the automotive engineer may see it as a large mass with a small contact area between it and the road

booc1

Abstractions are widely used in software development. UML diagrams provide abstractions by focusing on the fields (the state) and methods (the behavior) of a class. But at some levels, even the fields of a class may be irrelevant.

Abstractions are used to help understand complex systems.

  • Focus on essentials
  • Ignore the irrelevant
  • Ignore the unimportant

The only way for humans to deal with complexity is to avoid it, by working at higher levels of abstraction. We can get more done if we program by combining components of useful functionality rather than manipulating variables and control flow; that’s why most people order food from a menu in terms of dishes, rather than detail the recipes used to create them.

Data Types

Data Types are the abstractions of the memory cells.

It focuses on:

  • The possible values it can hold
  • Possible operations that can be performed

Ignores:

  • How the bits are laid out
  • What is the mechanism to access the data
  • How the operations are performed

Abstract Data Types

Abstract data type (ADT) A data type whose properties (values and operations) are specified independently of any particular implementation. All the Java built-in types are ADTs. A Java programmer can declare variables of those types without understanding the underlying implementation. The programmer can initialize, modify, and access the information held by the variables using the provided operations.

In addition to the built-in ADTs, Java programmers can use the Java class mechanism to build their own ADTs. For example, the Date class defined can be viewed as an ADT. Yes, it is true that the programmers who created it need to know about its underlying implementation; for example, they need to know that a Date is composed of three int instance variables, and they need to know the names of the instance variables. The application programmers who use the Date class, however, do not need this information. They only need to know how to create a Date object and how to invoke the exported methods to use the object.

Fraction ADT

Most programming languages have types for integers and real (decimal) numbers, but not for fractions. Such numbers can be implemented as objects. Here is a design for a fraction type:

ADT: Fraction

plus(Fraction): Fraction

times(Integer): Fraction

times(Fraction): Fraction

reciprocal(): Fraction

value(): Real

 

This ADT specifies five operations. Note that the times() operation is overloaded.

Note that the ADT uses generic terms for types: Integer instead of int, and Real instead of double. That is because it is supposed to be independent of any specific programming language.

Objects

An object is a representation of “thing” ( someone or something )

The thing can be anything a real life object or something more convoluted concept.

An object has certain characteristics, and are called properties or variables : For example a cat object has a specific color, weight and name.

An Object has a behavior ( performs actions), and these actions are called methods : For example a cat object can sleep, hide, escape etc.

In essence an object contains a collection of related methods and data.

booc2

Much of the point of object is to encapsulate access to its internals through its API and to hide the details from the rest of the system.

An analogy with spoken language.

  • Objects are most often named using nouns (eagle, parrot, Hyderabad)
  • Methods are verbs (sleep, escape, hide)
  • Values of the properties are adjectives (color red, weight 10kg)

Example Sentence:

The black cat sleeps on my head

“The cat” (a noun) is the object, “black” (adjective) is the value of the color property, and “sleep” ( a verb) is an action, or an method in OOP. “on my head” specifies something about the action “sleep”, so it is active as a parameter passed to the sleep method.

Example Object representation:

To represent vehicle as an object you would program its behaviors  as methods and declare variables to information about its characteristics and states.

booc3

Objects communicates by sending and receiving messages.

booc4

A message a simply the name of the Object followed by name of the method. If a method requires any additional information in order to know precisely what to do the message includes that information as a collection of data elements called parameters. The object that initiates the message is called the sender of the message, and the object that receives the message is called receiver.

To make an automated vehicle move to a new location, some other object might send the following message.

vehicle107 moveTo :binB7

booc5

vehicle107 is the name of the receiver, moveTo is the method that is being asked to execute and binB7 is the parameter telling the receiver where to move.

Classes

In real life similar objects can be grouped based on some criteria. A hummingbird and eagle are both birds, so they can be classified as belonging to the Bird Class. In OOP a class is a blue print or recipe for an Object. Another name for “Object” is “instance” , so we say that the eagle is an instance of the Bird class. You can create different objects using the same class, because a class is just a template, while objects are concrete instances, based on the template.

In Short a class is a software template that defines the methods and variables to be included in a particular kind of object. The methods and variables that make up the object are defined only once in the definition of the class

booc6

The purpose of the class is to specify the behavior of its instances, the specification has two components, a message interface and an implementation of that interface. The interface specifies what the class can do and it consists of s list of messages that the class can respond to. The implementation specifies how those operations are carried out and it consists of method code and variable definitions.

booc7

Encapsulation

Packaging data(stored in properties) and the means to do something with the data ( using methods) together is called Encapsulation.

booc8

booc9

Encapsulation ensures that behavior  of the object  can only be affected through its API. It let us control how much a change to one object will impact other parts of the system by ensuring that there are no unexpected dependencies between unrelated components. Many a times Encapsulation is mixed up with Information hiding which altogether a different concept.

Information Hiding Conceals how an object implements its functionality behind the abstraction of its API. It lets us work with higher level abstraction by ignoring lower level details that are unrelated to the task at hand.

Why Encapsulate

When working with badly encapsulated, we spend too much time tracing what the potential effects of the change might be, looking at where objects are created, what common data they hold, and where their contents are referenced.  Maintainability is the goal – the ability to change the code without fear, with hesitation and without feeling resistance, to allow change quickly something we should be eager to do. If the system largely lacks encapsulation it would be difficult to change the system and hence it cannot evolve.

booc10

Many object oriented languages support encapsulation by providing control over the visibility of the objects feature to other objects buts that’s that enough. Objects can break encapsulation by sharing references to mutable objects, an affect known as aliasing.

Aliasing is essential for conventional object oriented systems (otherwise no two objects would be able to communicate), but accidental aliasing can couple unrelated parts of the system so it behaves mysteriously and it is inflexible to change.

How Ensure Encapsulation

  • Define immutable value types
  • Avoid global variables and singletons
  • Copy collections and mutable values when passing them between objects

Association

There are several ways we can associate one object with another, composition, aggregation and inheritance.

booc11

Aggregation and Composition (Objects Inside Order Objects)

A variable contained within object can used in two different ways.

  • They can be used to store data values.
  • They can contain references to other objects

Reference held by a variable provides the containing object with a handle through which it can manage its complexity by sending  appropriate messages to its components (contained objects).

Combining several objects into one is known as Aggregation or Composition. Aggregation is a powerful way to separate a problem into smaller and more manageable parts. When a problem scope is so complex that it is impossible to think about it at a detailed level in its entirety, you can separate the problem into several smaller areas and then possibly separate each of these into even smaller chunks. This allows you to think about the problem in several level of abstraction.

Objects that contain other objects is called composite objects. Composite objects are important because they can represent far more sophisticated structures than sophisticated objects can.

booc12

components of an aircraft.

Another analogy would be a Book object can contain(aggregate) one or more author objects, a publisher object several chapter objects a table of content and so on.

The objects contained in composite objects may themselves be composite objects, and this nesting can be carried out to any number of levels.

booc13

Difference Between Aggregation and Composition

 

When a containing object (University)controls all access to controlled object (Department), we say that contained object is a composition of contained objects. For example an University object would be a composition of Departments objects, Each Department object belongs to unique University object, which controls access to its departments. In this case we say that University owns Departments.

When a containing object(Department) references another object(Professor)which are also accessible  outside of containing object, we say that containing object is an aggregation of contained object. For example a Department object contains references to Professor objects who are members of the department, but who also exists outside of the department. In fact a professor could be a member of two different departments. In this case we say that Department has a Professor object.

Benefits of Aggregation (and Composition)

  • Matches with how real world things are represented.
  • Composites objects lay foundation to a mechanism called delegation in which an object assigns a task to another object. through delegation we achieve division of labor.

 

Inheritance

The mechanism whereby one class of objects can be defined as a special case of a more general class is known as inheritance. Special cases of a class are commonly known as subclasses of that class; the more general class, in turn, is known as the superclass of its special cases. In addition to the methods and variables they inherit, subclasses may define their own methods and variables. They can also redefine any of the inherited methods , customizing them for its own needs, This way the interface stays the same, the method name is the same, but when called on the new object, the method behaves differently. This way of redefining how an inherited method works is known as overriding.

booc14

In simple terms, when a class A includes all the members of a class B, we say that A is an extension of B, and that it inherits all the properties of B. Often the following phrases can be used interchangeably: “B inherits from A” and “B extends A”. For example, a Professor class would be an extension of a Person. If A is an extension of B, we say that an A object “is a” B object. For example, a professor is a person.

For example, the class AutomatedVehicle could be broken down into two subclasses, PalletAGV and RollAGV, each of which inherited the general characteristics of the parent class. Either subclass could establish its own special characteristics by adding to the parent’s definition or by overriding its behavior.

booc15

Hierarchies of Classes

Classes can be nested to any degree, and inheritance will automatically accumulate down through all the levels. The resulting treelike structure is known as a class hierarchy.

booc16

An instance of, say, VariableSpeedDriveMotor would inherit all the characteristics of the Part class, as well as those of Motor and DriveMotor.

Class hierarchies increase the ability of objects to reflect the way we view the real world. Human knowledge is often organized in a hierarchical manner, relying on generic concepts and their refinement into increasingly specialized cases.

 Polymorphism

Polymorphism, from the Greek “poly”, for many, and “morph”, for form, means that the same thing can have different forms (or shapes). It’s a technical term in many fields, including chemistry, biology, and (of course) computer science. Each field defines it in terms relevant to that field of study, but it all boils down to having multiple forms. In the physical world, water is a good example of polymorphism. In its natural state, water is a liquid. When frozen, that liquid becomes a solid block of ice. But when boiled, water turns into a gas.

Polymorphism(a consequence of inheritance) is an OOP feature that enables an object to determine which method implementation to invoke upon receiving a method call. In some programming languages, polymorphism is also called late-binding or runtime-binding or dynamic binding.

The term Polymorphism in essence implies ‘multiple bodies’ that provide the same behavior. And hence essentially it is more on the intent. For example, simply connecting to a different server that provides the same web service is employing polymorphism.

 

Behold there are many types

 

At the computer-language level, there are four kinds of polymorphism: coercion, overloading, parametric, and inclusion.

booc17

The first kind of polymorphism, coercion polymorphism, refers to a single operation serving several types through implicit type conversion. For example, the multiplication operation, which manifests itself in source code through the multiplication operator symbol (*), allows you to multiply an integer by another integer and a floating-point value by another floating-point value. However, if one operand is an integer and the other operand a floating-point value, the compiler must coerce (convert) the integer’s operand type to floating-point. Otherwise, a type error occurs—because Java’s multiplication operation does not multiply integers by floating-point values, or vice versa. Another example of coercion polymorphism involves method calls. If a class declares a method with a superclass parameter and if a call is made to that method with a subclass object reference, the compiler implicitly coerces (converts) the subclass reference type to the superclass reference type. That way, only superclass-defined operations are legal (without explicit type casts) in the method.

The second kind of polymorphism, overloading polymorphism, refers to using a single operator symbol or method name for different operations. For example, the + operator symbol signifies any one of several operations based on its operands’ types. If both operands have integer types, the integer addition operation takes place. Similarly, if both operands have floating-point types, a floating-point operation takes place. Finally, if those operands are strings, string concatenation will be performed. Along with its language-defined operator overloading, Java also permits method names to overload, provided that the number and/or types of each method’s parameters differ. That way, the same method-name identifier can apply to different operations.

Many developers do not feel coercion and overloading polymorphism represent true forms of polymorphism. At close inspection, coercion and overloading polymorphism are seen as convenient type conversion aids and syntactic sugar. In contrast, parametric and inclusion polymorphism are considered to be genuine polymorphism.

The third kind of polymorphism, parametric polymorphism, refers to a class declaration that allows the same field names and method signatures to associate with a different type in each instance of that class. For example, you might create a Set class with a value field that holds any type of referenced data item. To allow for proper type checking at compile time, you do not want to give that field Object type (as in Object value;) as the compiler cannot inform you if the code attempts to perform invalid operations on value, because only the JVM knows the actual type of value at runtime. However, you do not want to tie value to a specific type in your source code, because you lose the benefit of being able to store different object types in your Set objects. To achieve the best of both worlds, parametric polymorphism gives you the benefits of compiler type checking—which alerts you to attempts to perform invalid operations on value—and allows value to hold references to different object types.

The final kind of polymorphism, inclusion polymorphism, refers to a situation in which a type can be another type’s subtype. Every subtype value can appear in a supertype context, where the execution of the supertype’s operations (on that value) results in the execution of the subtype’s equivalent operations. For that reason, inclusion polymorphism is also known as subtype polymorphism.

Examples

In Java and other OOP languages, it is legal to assign to a reference variable an object whose type is different from the variable type, if certain conditions are met. In essence, if you have a reference variable a whose type is A, it is legal to assign an object of type B, like this

A a = new B();

provided one of the following conditions is met.

  • A is a class and B is a subclass of A.
  • A is an interface and B or one of its parents implements A.

When you assign a an instance of B like in the code above, a is of type A. This means, you cannot call a method in B that is not defined in A. However, if you print the value of a.getClass().getName(), you’ll get “B” and not “A.” So, what does this mean? At compile time, the type of a is A, so the compiler will not allow you to call a method in B that is not defined in A. On the other hand, at runtime the type of a is B, as proven by the return value of a.getClass().getName().

Now, here comes the essence of polymorphism. If B overrides a method (say, a method named play) in A, calling a.play() will cause the implementation of play in B (and not in A) to be invoked. Polymorphism enables an object (in this example, the one referenced by a) to determine which method implementation to choose (either the one in A or the one in B) when a method is called. Polymorphism dictates that the implementation in the runtime object be invoked.

What if you call another method in a (say, a method called stop) and the method is not implemented in B? The JVM will be smart enough to know this and look into the inheritance hierarchy of B. B, as it happens, must be a subclass of A or, if A is an interface, a subclass of another class that implements A. Otherwise, the code would not have compiled. Having figured this out, the JVM will climb the ladder of the hierarchy and find the implementation of stop and run it.

For example, you can have a generic object Animal, which has property such as name, and that implements the functionality walk, speak, sleep, eat. Then you figure out that you need a Duck Object, a Dog Object and a Cat object. You could re-implement all the methods and properties that Animal has, but it would be smarter to just say that those Objects inherits Animal, and save yourself some work. Those Specific  objects only need to override the methods(in this case speak) that they do differently, while reusing all of the Animal’s functionality.

booc18

Now this means that both Generalized Object and Specialized Objects both have method speak. Now imagine that somewhere in the code there is variable called sweetie, and it so happens that we don’t know if sweetie is a Cat, a Dog or a Duck, we can still call the speak method of sweetie object, and the code will work.

Why Object Oriented Programming

 

Modularization

booc19

Decompose problem into smaller sub problems that can be solved separately.

Abstraction – Understandability

booc20

Terminology of the problem domain is reflected in the software solution. Individual modules are understandable by human readers.

Encapsulation

booc21

 

Composability — Structured Design

booc22

Interfaces allow to freely combine modules to produce new systems.

Hierarchy

booc23

 

Incremental development from small and simple to more complex modules.

Continuity

booc24

Changes and maintenance in only a few modules does not affect the architecture.

 

References

  • Data Structures and Algorithms in Java
  • Data Structures and Problem solving using Java
  • Object Oriented data Structures using Java
  • Data Structures with Java
  • Head First Java
  • Object Oriented Design with UML and Java
  • Object Oriented Java Script
  • Growing Object Oriented Software guided by tests.
  • Emergent Design The Evolutionary Nature of Professional Software Development
  • Object Technology: A Manager’s Guide
  • Object Oriented Thought Process
  • Essential Skills for Agile Developers
  • Java For Dummies
  • Java™: A Beginner’s Tutorial