Tag: Oracle Database

Open Source and Extract, Transform & Load (ETL)

Open Source and Extract, Transform & Load (ETL)

What is ETL ?

Extract, Transform, and Load (ETL) is procedure to

a) Extracting data from outside sources
b) Transforming the data as ther operational requirements
c) Loading it into the end target (database or data warehouse)

1) Talend Open Studio ~ http://www.talend.com

Talend Open Studio operates as a code generator allowing data transformation scripts and underlying programs to be generated either in Perl or in Java.
Its GUI is made of a metadata repository and a graphical designer. The jobs are designed using graphical components, for transformation, connectivity or other operations. The threads created can be executed from within the studio or as standalone scripts.

2) CloverETL ~     http://www.cloveretl.com/

CloverETL is a Java-based data integration framework used to transform, cleanse, standardize and distribute data to applications, databases or warehouses. Its component-based structure allows easy customization and embeddability.

* CloverETL Designer – the graphical user interface to create and modify data transformations for CloverETL Server and Engine.
* CloverETL Engine – executes the transformations (run-time); can be embedded as a library. Available under LGPL.
* CloverETL Server – full-fledged server application with a rich WEB-based administrative interface, which leverages the existing CloverETL Engine.

3) Pentaho ~ http://www.pentaho.com/

The Pentaho BI Project is Open Source application software for enterprise reporting, analysis, dashboard, data mining, workflow and ETL capabilities for Business Intelligence (BI) needs.

4) Apatar ~ http://www.apatar.com/

Apatar is an open source ETL (Extract-Transform-Load) and mashup data integration software application. Other open source data integration projects are Clover.ETL, Pentaho Project, Talend Open Studio or Enhydra Octopus.

5) http://www.azinta.com/
Regards,

Arun

Partitioning in MySQL

Partitioning in MySQL

Introduction –

Partitioning is available in Oracle since 8i (few feature or type) version but in MySQL it is available after 5.1 version. Partitioning allows you to distribute portions of table across a file system (physically at different location). User can define the partitioning of table so that some portion of table can stored at different physical location….

Table is structure which has columns and rows. Partitioning can be applied to rows or volumes.

* Horizontal partitioning ~ Storing different rows of table to different physical location is called as horizontal partitioning of table.
* Vertical partitioning ~ Storing different columns of table to different physical location is called as vertical partitioning of table.

MySQL supports only support “Horizontal partitioning” !!

** Advantages of using partitions ~

a) Smaller and more manageable pieces of data
b) Faster access of data
c) Import / export at the partition Level
d) Each partitions is independent of other
e) Easy to use
f) Reduced recovery time
g) Failure impact is less

1] How to check whether MySQL installed on your system supports ‘partitioning’ ?

* Check MySQL version as…

root@laptop:~# mysqladmin version -u root -p

* Check ‘partition’ plugin is installed or not… ?

mysql> SHOW PLUGINS;
+————+———-+—————-+———+———+
| Name | Status | Type | Library | License |
+————+———-+—————-+———+———+
| binlog | ACTIVE | STORAGE ENGINE | NULL | GPL |
| partition | ACTIVE | STORAGE ENGINE | NULL | GPL | **

…..

+————+———-+—————-+———+———+
10 rows in set (0.00 sec)

mysql>

* Check above output. If you don’t see ‘partition’ entry in above query. Then MySQL version does not support partitioning. If it entry exist then check “have_partitioning” system variable is set to ‘YES’ or not….

mysql> SHOW VARIABLES LIKE ‘%partition%’;
+——————-+——-+
| Variable_name | Value |
+——————-+——-+
| have_partitioning | YES |
+——————-+——-+
1 row in set (0.00 sec)
Partitioning
mysql>

2] Which storage engines (table type) supports partitioning?

MySQL partitioning engine runs in a separate layer and can interact with any of storage engine. So we can use most of the storage engine as backend/table type to store data. Note we must use same storage engine for all partitions of table.

3] Types of Partitions ~

There are 4 types of partitioning in MySQL or Oracle

a) Range Partitioning
b) List Partitioning
c) Hash Partitioning
d) Key Partitioning

NOTE ~ In any type of partitioning – partitions are always numbered automatically and in sequence starting with 0. When a new row is inserted into a partitioned table, these partition numbers that are used in identifying the correct partition. Also partition names are not case-sensitive. Number of partitions for the table, this must be expressed as a positive, non-zero integer literal with no leading zeroes, and may not be an expression.

MAXVALUE ~ It is an integer value that is always greater than the largest possible integer value (least upper limit).

** Let’s create database ‘mydata’ and create table with each partition type –

mysql> create database mydata;
Query OK, 1 row affected (0.33 sec)

mysql> show databases;
+——————–+
| Database |
+——————–+
| information_schema |
| mydata |
| mysql |
+——————–+
3 rows in set (0.00 sec)

mysql>

A] Range Partitioning ~ table is partitioned by range of values. Each partition contains row which lies within a given range. The ranges should be
contiguous but not overlapping…

mysql> use mydata;
Database changed

mysql> CREATE TABLE cricket (
-> player_name VARCHAR(30),
-> country_name VARCHAR(30),
-> match_count INT,
-> run_scored INT NOT NULL
-> )
-> PARTITION BY RANGE (run_scored)
-> (
-> PARTITION p0 VALUES LESS THAN (26),
-> PARTITION p1 VALUES LESS THAN (51),
-> PARTITION p2 VALUES LESS THAN (76),
-> PARTITION p3 VALUES LESS THAN (101),
-> PARTITION p4 VALUES LESS THAN MAXVALUE
->
-> );
Query OK, 0 rows affected (0.01 sec)

mysql> show tables from mydata;
+——————+
| Tables_in_mydata |
+——————+
| cricket |
+——————+
1 row in set (0.00 sec)

mysql> desc cricket;
+————–+————-+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+————–+————-+——+—–+———+——-+
| player_name | varchar(30) | YES | | NULL | |
| country_name | varchar(30) | YES | | NULL | |
| match_count | int(11) | YES | | NULL | |
| run_scored | int(11) | NO | | NULL | |
+————–+————-+——+—–+———+——-+
4 rows in set (0.00 sec)

mysql>

*** If cricketer’s score is…
<= 25 then row will go to p0 partition
<= 50 then row will go to p1 partition and so on ..
If score is more than 100 then row will go to p4 partition

* inserting few records to test behavior…

mysql> SELECT * FROM cricket;
+————-+————–+————-+————+
| player_name | country_name | match_count | run_scored |
+————-+————–+————-+————+
| Rahul | India | 2 | 16 |
| Sachin | India | 2 | 56 |
| Saurav | India | 2 | 80 |
| Sachin | India | 1 | 122 |
| Rahul | India | 1 | 153 |
| Saurav | India | 1 | 183 |
+————-+————–+————-+————+
6 rows in set (0.00 sec)

mysql>

** Let’s us check the physical location of database….

root@laptop:~# ls /var/lib/mysql/mydata/
cricket.frm cricket#P#p0.MYD cricket#P#p1.MYD cricket#P#p2.MYD cricket#P#p3.MYD cricket#P#p4.MYD db.opt
cricket.par cricket#P#p0.MYI cricket#P#p1.MYI cricket#P#p2.MYI cricket#P#p3.MYI cricket#P#p4.MYI
root@laptop:~#

B] LIST Partitioning ~ It is similar to range partitioning. But each partition is defined based on the columns value which is one of value in lists,
rather than in one of a set of contiguous ranges of values…

mysql> CREATE TABLE world (
-> country_name VARCHAR(25) NOT NULL,
-> country_id INT NOT NULL
-> )
-> PARTITION BY LIST (country_id)
-> (
-> PARTITION asia VALUES IN (91,95,35,20),
-> PARTITION europe VALUES IN (10,16,82,84),
-> PARTITION america VALUES IN (12,25),
-> PARTITION australia VALUES IN (60,65)
-> );
Query OK, 0 rows affected (0.02 sec)

mysql> show tables;
+——————+
| Tables_in_mydata |
+——————+
| cricket |
| world |
+——————+
2 rows in set (0.00 sec)

mysql>

* inserting few records to test behavior…

mysql> SELECT * FROM world;
+————–+————+
| country_name | country_id |
+————–+————+
| India | 91 |
| Sri Lanka | 95 |
| France | 10 |
| U.K. | 16 |
| Canada | 12 |
| Brazil | 60 |
+————–+————+
6 rows in set (0.00 sec)

mysql>

** What if ‘country_id’ value NOT matching with lists values ?

=> use IGNORE keyword while inserting records… let’s try

mysql> INSERT IGNORE INTO world VALUES (‘South Africa’,50);
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM world;
+————–+————+
| country_name | country_id |
+————–+————+
| India | 91 |
| Sri Lanka | 95 |
| France | 10 |
| U.K. | 16 |
| Canada | 12 |
| Brazil | 60 |
+————–+————+
6 rows in set (0.00 sec)

mysql>

C] HASH Partitioning ~ HASH is used primarily to ensure an even distribution of data among a all number of partitions. If you do not include a
PARTITIONS clause while creating table with HASH partitioning, the number of partitions by defaults is one(1). The partition in
which the record is stored is determined by which_partition = MOD(expression, total_no_partition)

mysql> CREATE TABLE africa (
-> country_name VARCHAR(50),
-> population INT,
-> census_date DATE NOT NULL DEFAULT ‘2000-01-01’
-> )
-> PARTITION BY HASH (YEAR(census_date))
-> PARTITIONS 5;
Query OK, 0 rows affected (0.01 sec)

mysql> show tables;
+——————+
| Tables_in_mydata |
+——————+
| africa |
| cricket |
| world |
+——————+
3 rows in set (0.01 sec)

mysql>

mysql> SELECT * FROM africa;
+————–+————+————-+
| country_name | population | census_date |
+————–+————+————-+
| Nigeria | 2278032 | 2008-09-18 |
| Egypt | 21278032 | 2008-05-28 |
| South Africa | 1678032 | 2009-04-20 |
+————–+————+————-+
3 rows in set (0.00 sec)

mysql>

D] KEY Partitioning ~ It is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression.

4] Subpartitioning ~

It is also known as composite partitioning in which partition is divided in to sub partition. Use SUBPARTITION clauses while creating table.

**Few useful command to know more about tables –

mysql> SHOW CREATE TABLE cricket \G
*************************** 1. row ***************************
Table: cricket
Create Table: CREATE TABLE `cricket` (
`player_name` varchar(30) DEFAULT NULL,
`country_name` varchar(30) DEFAULT NULL,
`match_count` int(11) DEFAULT NULL,
`run_scored` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (run_scored)
(PARTITION p0 VALUES LESS THAN (26) ENGINE = MyISAM,
PARTITION p1 VALUES LESS THAN (51) ENGINE = MyISAM,
PARTITION p2 VALUES LESS THAN (76) ENGINE = MyISAM,
PARTITION p3 VALUES LESS THAN (101) ENGINE = MyISAM,
PARTITION p4 VALUES LESS THAN MAXVALUE ENGINE = MyISAM) */
1 row in set (0.00 sec)

mysql>

** Below statement will give information about all tables in given DB…

mysql> SHOW TABLE STATUS FROM mydata;

Thank you,
Arun Bagul

Oracle DB 10.2g installation on Ubuntu Linux

Oracle DB 10.2g installation on Ubuntu Linux

Introduction –

Oracle Database is widely used commercial Database on Unix/Linux and Windows platform….
This article explain the step by step installation of Oracle 10g Release 2.0 on Ubuntu 8.04111!!
Please refer the Oracle installation guide for System requirement…

1] Hardware requirement –

RAM – minimum 1024 MB of physical memory
Disk space -1.5 GB  to 3.5 GB of disk space, depends on the installation…
Minimum 400 MB of disk space in the /tmp directory

arunsb@arun:~$ free  -m
arunsb@arun:~$ df -h
arunsb@arun:~$ df -h /tmp

2] Software dependency –

Make sure that following packages are installed on your system!!

make-3.8
gcc-4.2
libdb4.2
libdb4.2++-dev
libstdc++6
libstdc++6-4.2-dev
libmotif-dev
libmotif3

root@arun:~# apt-get install  build-essential libc6-dev  libmotif-dev  libmotif3
root@arun:~# apt-get install  xauth libxp6 libxt6 libxtst6 libaio1  glibc-source
root@arun:~# apt-get install  rpm

root@arun:~# ln -s /usr/bin/awk   /bin/awk
root@arun:~# ln -s /usr/bin/basename     /bin/basename
root@arun:~# ln -s /usr/bin/rpm    /bin/rpm

root@arun:~# dpkg -l

3] Operating System Groups and Users for Oracle –

* oinstall – Oracle Inventory group
* dba       – The OS DBA group
* oracle   – Oracle software owner  (in mycase arunsb is oracle user)

root@arun:~# id oracle
id: oracle: No such user
root@arun:~#
root@arun:~# grep oinstall /etc/group
root@arun:~# grep dba  /etc/group

NOTE – “oinstall” should be primary group of oracle user. It is not necessary to use “oracle” as username, you can name but make sure that the “oinstall” is primary group of that user!!

root@arun:~# addgroup oinstall
Adding group ‘oinstall’ (GID 1001) …
Done.
root@arun:~#

root@arun:~# addgroup  dba
Adding group `dba’ (GID 1002) …
Done.
root@arun:~#

root@arun:~# addgroup nobody
Adding group `nobody’ (GID 1000) …
Done.
root@arun:~#

root@arun:~# useradd -g oinstall -G dba   arunsb
root@arun:~#

root@arun:~# id arunsb
uid=1000(arunsb) gid=1001(oinstall) groups=1001(oinstall),1002(dba)
root@arun:~#

*** other setting –

root@arun:~# echo “Red Hat Linux release 3.0 (drupal)” > /etc/redhat-release

root@arun:~# cat /etc/redhat-release
Red Hat Linux release 3.0 (drupal)
root@arun:~#

4] Configure Kernel Parameters –

Add following kernel variable value as shown below in sysctl.conf file, then run command “sysctl -p” to update the setting…
root@arun:~# tail -n 16  /etc/sysctl.conf
###############################
## Adding setting for oracle 10g

kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_default = 1048576
net.core.rmem_max = 1048576
net.core.wmem_default = 262144
net.core.wmem_max = 262144

##############################
root@arun:~#

* Use below command to set above kernel variables…

root@arun:~# sysctl -p

5] Oracle user profile and envirnoment –

Update /etc/security/limits.conf file as shown below….

root@arun:~# vi /etc/security/limits.conf
root@arun:~# tail  /etc/security/limits.conf
#######################
## Added for arunsb (oracle user)

arunsb soft nproc 2047
arunsb hard nproc 16384
arunsb soft nofile 1024
arunsb hard nofile 65536

#######################
#
root@arun:~#

Add following setting in /etc/pam.d/login file
root@arun:~# vi /etc/pam.d/login

###################################
# oracle user –
session    required   pam_limits.so
###################################

** set envirnoment variables in user profile file “/home/arunsb/.bashrc”  as shown below

arunsb@arun:~$ tail  /home/arunsb/.bashrc
###########################
# Setting for Oracle 10g
export DISPLAY=”:0.0″
export ORACLE_OWNER=arunsb
export ORACLE_BASE=/u01/app
export ORACLE_HOME=/u01/app/product/10.2.0/db_1
export ORACLE_HOSTNAME=arun.mylaptop.com
PATH=$PATH:/u01/app/product/10.2.0/db_1/bin

##########################
#
arunsb@arun:~$

6] Oracle inventory and installation setting –

you need to create oralce Inventory director where Oracle stores all information about oralce software installed on your system. Also use same direcory
for oracle installation make sure to set proper path in  user profile file ie .barshrc

root@arun:~# mkdir -p  /u01/app/oraInventory
root@arun:~# ls -ld /u01/app/oraInventory
drwxr-xr-x 2 root root 4096 2008-11-23 22:57 /u01/app/oraInventory
root@arun:~# chown arunsb:oinstall -R /u01/app/oraInventory
root@arun:~# ls -ld /u01/app/oraInventory
drwxr-xr-x 2 arunsb oinstall 4096 2008-11-23 22:57 /u01/app/oraInventory
root@arun:~#

arunsb@arun:~$ mkdir -p /u01/app/product/10.2.0/db_1

7] It’s  time to install Oracle –

arunsb@arun:/root/oracle-setup/oracle_server-10g$ ./runInstaller
Starting Oracle Universal Installer…

In between installation you need to run following shell scripts as root user…..

root@arun:~# /u01/app/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/oraInventory to 770.
Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete
root@arun:~# /u01/app/product/10.2.0/db_1/root.sh
Running Oracle10 root.sh script…

The following environment variables are set as:
ORACLE_OWNER= arunsb
ORACLE_HOME=  /u01/app/product/10.2.0/db_1

Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin …
Copying oraenv to /usr/local/bin …
Copying coraenv to /usr/local/bin …
Creating /etc/oratab file…
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
root@arun:~#

8] Errors during installation –

NOTE – Installation failed many times with following ERRORs … please don’t forget to create soft link for awk , basename , rpm

Exception Name: MakefileException
Exception String: Error in invoking target ‘ntcontab.o’ of makefile

Exception Name: MakefileException
Exception String: Error in invoking target ‘utilities ctx_on’ of makefile

=================================================
The following J2EE Applications have been deployed and are accessible at the URLs listed below.

iSQL*Plus URL:
http://arun.mylaptop.com:5560/isqlplus
iSQL*Plus DBA URL:
http://arun.mylaptop.com:5560/isqlplus/dba
Oracle Enterprise manager URL
http://arun.mylaptop.com:5560/isqlplus/dba
=================================================

Thank you,
Arun Bagul