May 23, 2016


load data into elastic using logstash

# install logstash


rpm -iUh logstash-2.3.2-1.noarch.rpm

# test logstash can generate json output

bin/logstash -e 'input {file {path => "/var/log/cron" start_position => beginning} } output {stdout { codec => json } } '

# push data to elastic
cd /opt/logstash/
bin/logstash -e 'input {file {path => "/var/log/cron" start_position => beginning} } output { elasticsearch { hosts => ""} } '

May 22, 2016


Backing up data only containers

The advantage of data only container is that you can easily take backup of the data. For e.g. in this case the tutum/mysql:5.5 image is used to store data. The data container is committed and then pushed to docker hub. Since this data my contain important information it is better to keep it as private repository.

[root@ip-172-31-15-118 mylamp]# docker ps
CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS                                                  NAMES
99526bfa2f0f        mylamp_web            "/"           11 minutes ago      Up 11 minutes>80/tcp, 3306/tcp,>8000/tcp   mylamp_web_1
c7950eeab8e9        tutum/mysql:5.5       "/"           11 minutes ago      Up 11 minutes>3306/tcp                                 mylamp_mysql_1
3802f7c35133        tutum/ubuntu:trusty   "/"           11 minutes ago      Up 11 minutes>22/tcp                                     mylamp_crmdbvol1_1

[root@ip-172-31-15-118 mylamp]# docker commit 3802f7c35133 shantanuo/todel

[root@ip-172-31-15-118 mylamp]# docker login
Username: XXX
WARNING: login credentials saved in /root/.docker/config.json
Login Succeeded

[root@ip-172-31-15-118 mylamp]# docker push shantanuo/todel
The push refers to a repository [] (len: 1)
f8a3d4f0376b: Pushed
ebe73f29e6e1: Pushed
latest: digest: sha256:f1fda8ef1ac49ed013131177c9213716a3cb761671e0ff43315ed06422213f7a size: 28444

May 21, 2016


decouple application using docker compose

# install packages
yum install -y git mysql-server docker

curl -L`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

/etc/init.d/docker start

# create and copy public key to github to clone private repo

ssh-keygen -t rsa -b 4096 -C ""

cat ~/.ssh/

# clone your private repository
git clone

# create your custom my.cnf config file

mkdir -p /my/custom/
vi /my/custom/my.cnf

# start docker containers as per yml file

/usr/local/bin/docker-compose up -d

# restore mysql data into container from host:
mysqladmin -h localhost -P 3306 --protocol=tcp -u root -ppasswd create livebox

mysql -h localhost -P 3306 --protocol=tcp -u root -ppasswd livebox < livebox.sql

# access mysql container from another container:

mysql -h mysql -uroot -ppasswd

## use container ID of tutum/mysql:5.5 for e.g.
docker logs c7950eeab8e9
    mysql -uadmin -pxmFShXB1Asgn -h127.0.0.1

# use the password to connect to and execute commands:
mysql> grant all on *.* to 'root'@'%' identified by 'passwd' with grant option;
Query OK, 0 rows affected (0.00 sec)

# data only container to be used for mysql data
docker run -d -v /var/lib/mysql --name db_vol1 -p 23:22 tutum/ubuntu:trusty
docker run -d --volumes-from db_vol1 -p 3306:3306 tutum/mysql:5.5

May 16, 2016


List all databases on elasticsearch server

Here is the python code that will list all indices from elastic.
It will return the current date if the data is being partitioned daily or hourly.

import elasticsearch
es = elasticsearch.Elasticsearch('')

We can also use curl to get the same list...

# curl ''
health status index            pri rep docs.count docs.deleted store.size
yellow open   blank-struts2      5   1          0            0       575b           575b
yellow open   vq-2016-05-02-07   5   1          1            0      3.4kb          3.4kb
yellow open   vq-2016-05-02-09   5   1         94            0     41.6kb         41.6kb

May 14, 2016


Python multiprocessing made easy

This is how a linear download would happen that will take time because files are downloaded one after other. This process can be improved using multiprocessing module. But using that module is complicated.
import os
import requests

def worker(url):
    fname = os.path.basename(url)
    print fname, url
    print "\n"
    r = requests.get(url)
    with open(fname, "wb") as f:
def run(urls):
    for url in urls:
        print url
urls = ["",
            "" ]

I am not even showing how multiprocessing would do this because here is a new module called deco!
I need to add only 3 lines to the code. One is import and 2 are decorators!
from deco import concurrent, synchronized
import os
import requests

def worker(url):
    fname = os.path.basename(url)
    print fname, url
    print "\n"
    r = requests.get(url)
    with open(fname, "wb") as f:
def run(urls):
    for url in urls:
        print url
urls = ["",
            "" ]

pip install deco is available from here...

And here is how we can achive the same with less code and using just one linux command "parallel".

# parallel -j4 --lb "python  -u worker {1} " ::: ""  ""  ""  ""

# cat
import os
import requests

def worker(url):
    fname = os.path.basename(url)
    r = requests.get(url)
    with open(fname, "wb") as f:

if __name__ == '__main__':
    import fire

add --lb and -u to keep Python and Parallel from buffering the output so you can see it being run in parallel.

May 11, 2016


Selling docker and python

A lot of people have not fully understood the power of docker. It can be used for various purpose. For e.g. here is an easy way to convince people to upgrade to python 3.0+

# docker run -it --rm shantanuo/pyrun:2.7 python
>>> 3/2

# docker run -it --rm shantanuo/pyrun:3.4 python
>>> 3/2

Python 2.7 version returns absolute value 1 while 3.4 version correctly returns 1.5

