Configuring Cassandra Cluster on cloud with Load Balancer

Cloud Used: Rackspace

Load Balancer used: HAProxy

OS: Centos 5.5

Cassandra version used: apache-cassandra-0.6.5

Find below steps to cluster Cassandra through HAProxy on Rackspace Cloud:-

1.     Install HAProxy on the any node currently I am using centos 5.5

2.     Install cassandra as seed node on another machine. By default, Cassandra uses 7000 for cluster communication, 9160 for clients (Thrift), and 8080 for JMX.

3.     Change cassandra clustering configuration on seed node in the file $CASSANDRA_HOME/conf/storage-conf.xml as follows

1)   In the seed enter the IP of HAProxy Load Balancer node

<Seeds>

<Seed> HAProxy_Load_Balancer_IP</Seed>

</Seeds>

2)   Enter ip of cassandra seed node in the ListenAddress and ThriftAddress

<ListenAddress>cassandra_seed_ip</ListenAddress>

<ThriftAddress>cassandra_seed_ip</ThriftAddress>

4.     Open cassandra ports on seed node by running following commands on command prompt

iptables -I INPUT 1 -p tcp  –dport 7000 -j ACCEPT

/etc/init.d/iptables save

/etc/init.d/iptables restart

iptables -I INPUT 1 -p tcp  –dport 9160 -j ACCEPT

/etc/init.d/iptables save

/etc/init.d/iptables restart

iptables -I INPUT 1 -p tcp  –dport 8080 -j ACCEPT

/etc/init.d/iptables save

/etc/init.d/iptables restart

5.     Edit HaProxy configuration file /etc/haproxy.cfg on the HaProxy node to add Cassandra port configurations as follows

listen cassandraseed

bind *:7000

mode tcp

option tcplog

log global

balance roundrobin

clitimeout 150000

srvtimeout 150000

contimeout 30000

server server1 cassandraSeedNodeIP:7000 check

listen cassandrathrift

bind *:9160

mode tcp

option tcplog

log global

balance roundrobin

clitimeout 150000

srvtimeout 150000

contimeout 30000

server server1 cassandraSeedNodeIP:9160 check

listen cassandrajmx

bind *:8000

mode tcp

option tcplog

log global

balance roundrobin

clitimeout 150000

srvtimeout 150000

contimeout 30000

server server1 cassandraSeedNodeIP:8080 check

6.     Open ports on Haproxy node by running following commands on command prompt

iptables -I INPUT 1 -p tcp  –dport 7000 -j ACCEPT

/etc/init.d/iptables save

/etc/init.d/iptables restart

iptables -I INPUT 1 -p tcp  –dport 9160 -j ACCEPT

/etc/init.d/iptables save

/etc/init.d/iptables restart

iptables -I INPUT 1 -p tcp  –dport 8000 -j ACCEPT

/etc/init.d/iptables save

/etc/init.d/iptables restart

7.     Install cassandra as  non-seed node on another machine.

8.     Change cassandra clustering configuration on non-seed node in the file $CASSANDRA_HOME/conf/storage-conf.xml as follows

1)   In the seed enter the IP of HAProxy Load Balancer

<Seeds>

<Seed> HAProxy_Load_Balancer_IP</Seed>

</Seeds>

2)   Enter ip of cassandra non-seed node in the ListenAddress and ThriftAddress

<ListenAddress> cassandra_non-seed_ip</ListenAddress>

<ThriftAddress> cassandra_non-seed_ip</ThriftAddress>

3)   On AutoBootstrap on the non-seed-node

<AutoBootstrap>true</AutoBootstrap>

9.     Open cassandra ports on non-seed node by running following commands on command prompt

iptables -I INPUT 1 -p tcp  –dport 7000 -j ACCEPT

/etc/init.d/iptables save

/etc/init.d/iptables restart

iptables -I INPUT 1 -p tcp  –dport 9160 -j ACCEPT

/etc/init.d/iptables save

/etc/init.d/iptables restart

iptables -I INPUT 1 -p tcp  –dport 8080 -j ACCEPT

/etc/init.d/iptables save

/etc/init.d/iptables restart

10.                       Restart Start Haproxy on Haproxy node bye running following command:

/etc/init.d/haproxy restart

11.                        Start Seed Cassandra node by running following command

$CASSANDRA_HOME/bin/cassandra

12.                       Start Non-Seed Cassandra node by running following command

$CASSANDRA_HOME/bin/cassandra

That’s it your cassandra machines are cluster through HaProxy 🙂

You can verify by testing through cassandra cli. Run cassandra cli on both node by running following command and connect to their respective thrift address:

  • On Seed node run following commands:

a. $CASSANDRA_HOME/bin/cassandra-cli

b. cassandra> connect seed_ip/9160

c. cassandra> set Keyspace1.Standard1[‘IIPL-1274’][‘name’]=’Sunil Kumar’

  • On Non-Seed run following commands:

a. $CASSANDRA_HOME/bin/cassandra-cli

b. cassandra>connect non-seed_ip/9160

c. cassandra>  get Keyspace1.Standard1[‘IIPL-1274’]

output should be

=> (column=6e616d65, value=Sunil Kumar, timestamp=1288196657949000)

Returned 1 results.

cheeeeeeeeeeeeeeeeeeers:)

About suniluiit

Technical Architect working in Big data and Cloud technologies for the last 5 years with overall software industry experience of around 9 years. Architected and Working on Impetus Workload Migration Product which allows organizations to saves 50%-80% manual offloading time and cost. It provides faster parallel & scalable data migration to Hadoop along with incremental data options. It also maximize the existing investments in code and reuse of SQL scripts. Architected and Developed cloud agnostic application for deployment and configuration management of the enterprise application including technologies stacks like CQ5, Cassandra, Solr, Application Server, Web server, Haproxy, F5 and messaging server. Experienced in working and leading R&D teams for building new expertise in fields such as Big data, ETL offloading to Big Data and Cloud computing. Worked on some of the impetus open source product around Big Data and Social Media http://code.google.com/p/hadoop-toolkit/ http://code.google.com/p/zing https://github.com/impetus-opensource Specialties: Big data, Hadoop, HIve, Sqoop, Spark, J2EE/ SOA, NoSQL, Cassandra, HBase, Cloud Computing (Private/Hybrid/Public- AWS, Google, Azure, Rackspace, Openstack, VMWare, Terremark, RabbitMQ, Kafka, Memcached, Puppet, HypericHQ, Splunk etc.
This entry was posted in Uncategorized. Bookmark the permalink.

1 Response to Configuring Cassandra Cluster on cloud with Load Balancer

  1. Hi, its nice post about media print, we all be familiar with media
    is a fantastic source of facts.

Leave a reply to ăn uống ngon lạ Cancel reply