Shantanu's Blog

Database Consultant

December 31, 2017

Machine learning basics

Machine learning is used to learn from given data and then predict values. For e.g. here is share price of a company for given years.

import numpy as np
xval = np.array([2001,2002,2003,2003,2004,2003,2006,2008,2009,2010]).reshape(-1,1)
yval = [1,2,3,4,5,6,7,7,9,10]

We need to create a model to store the data...

import sklearn.linear_model as skl
model = skl.LinearRegression()

The fit method of model will learn and help us predict values. In this case the price expected for the year 2012 is around 11.66

model.fit(xval,yval)
model.predict(2012)

array([ 11.66141732])

We can also plot the data to understand how the values are moving acorss years...

import pylab as py
py.scatter(xval,yval)

Labels: python

posted by shantanu # Sunday, December 31, 2017 0 comments

December 30, 2017

Install mysql with tokuDB engine within percona

This is required if you get an error while initiating tokudb engine:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

And this is required if you get permissions error:

rm -rf /storage/custom3381

mkdir /storage/custom3381

chown 1001 /storage/custom3381

percona server has built-in environment variable for tokudb:

docker run -p 3381:3306 -v /my/custom3381:/etc/mysql/conf.d -v /storage/custom3381:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=india3381 -e INIT_TOKUDB=1 -d percona/percona-server:5.7

Labels: docker, mysql

posted by shantanu # Saturday, December 30, 2017 0 comments

Using xtra-backup for incremental backups

1) Download xtrabackup package
2) change directory
3) Full Backup
4) INcremental backup
5) Restore
6) Start mysql using backup

# Linux
wget https://www.percona.com/downloads/XtraBackup/Percona-XtraBackup-2.4.9/binary/tarball/percona-xtrabackup-2.4.9-Linux-x86_64.tar.gz

# centOS and redhat
yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-release-0.1-4.noarch.rpm
yum install percona-xtrabackup-24
_____

cd percona-xtrabackup-2.4.9-Linux-x86_64

bin/xtrabackup --defaults-file=/my/custom3396/my.cnf -H 172.31.0.57 -uroot -pindia3396 -P 3396 --datadir /storage/mysql/datadir3396 --backup --target-dir=/data3/backups/full/
_____

The main advantage of using xtrabackup is that we can take incremental backup that will be much faster.

bin/xtrabackup --defaults-file=/my/custom3396/my.cnf -H 172.31.0.57 -uroot -pindia3396 -P 3396 --datadir /storage/mysql/datadir3396 --backup --target-dir=/data3/backups/inc1 --incremental-basedir=/data3/backups/full/

The next day, we need to simply change the target directory path to "inc2" like this:

bin/xtrabackup --defaults-file=/my/custom3396/my.cnf -H 172.31.0.57 -uroot -pindia3396 -P 3396 --datadir /storage/mysql/datadir3396 --backup --target-dir=/data3/backups/inc2 --incremental-basedir=/data3/backups/inc1
_____

In case of disaster we need to apply logs and then prepare data:

1) First apply logs of target directory:
bin/xtrabackup --prepare --apply-log-only --target-dir=/data3/backups/full/

2) Apply logs from incremental backup:
bin/xtrabackup --prepare --apply-log-only --target-dir=/data3/backups/full/ --incremental-dir=/data3/backups/inc1

3) apply log only option should not be used for the last incremental backup.
bin/xtrabackup --prepare --target-dir=/data3/backups/full/ --incremental-dir=/data3/backups/inc2

4) Finally prepare target without apply log option for target directory:
bin/xtrabackup --prepare --target-dir=/data3/backups/full/
_____

Now since the backup data directory is ready, we can create a new docker container pointing to the newly "prepared" data.

docker run -p 3391:3306 -e MYSQL_ROOT_PASSWORD=india3391 -v /my/custom3391:/etc/mysql/conf.d -v /data3/backups/full:/var/lib/mysql -d shantanuo/mysql:5.7

You can check if the new data is working correctly.

mysql -h `hostname -i` -uroot -pindia3396 -P 3391

Labels: mysql

posted by shantanu # Saturday, December 30, 2017 0 comments

December 19, 2017

Using property in python class

Here is how a standard class look like. When I call monthly function, I get the default 35000 value. I can however set a new value by calling another function called monthly_updated.

class pay_check:
def __init__(self):
self._salary = 35000

def monthly(self):
return self._salary

def monthly_updated(self, value):
self._salary=value

myclass=pay_check()
myclass.monthly()
myclass.monthly_updated(40000)
myclass.monthly()

This works, but it is possible to improve the usability of the class by adding property decorator. I make the monthly function as default getter that will be called when the user request the property method.

class pay_check:
def __init__(self):
self._salary = 35000

@property
def monthly(self):
return self._salary

@monthly.setter
def monthly(self, value):
self._salary=value

myclass=pay_check()

Instead of myclass.monthly() I can now simply use myclass.monthly (without brackets)
myclass.monthly

Another advantage is that I can use the same method to set the new value as shown below:
myclass.monthly=50000

Now the new value of salary is 50,000 as returned by this:
myclass.monthly

There are many advantages of using this style of programming. The code is readable, elegant and can be easily maintained. The user may slightly get confused with property concept since he has only seen functions as methods. But once he understand this, he can not live without it!

For e.g.
df.columns will return the column headings, but I can use the same function name to change the column names like this...
df.columns=['name', 'experience', 'remuneration', 'amount']

Or set a new value for the entire column:
df['dummy'] = '0'

And return the values of the given column using the same slice like this...
df['dummy']

Understanding how "get", "set" and "del" properties are handled in a class is very important to manage the class instances.

Labels: pandas, python

posted by shantanu # Tuesday, December 19, 2017 0 comments

December 17, 2017

list all files from S3 bucket

# Here is the python code that will check if any of the files in a given S3 bucket is publicly accessible. Change your-bucket-name, region and access / secret key

import boto
from boto.s3.connection import OrdinaryCallingFormat
conn = boto.s3.connect_to_region('ap-south-1', aws_access_key_id='xxx', aws_secret_access_key='xxx',calling_format=OrdinaryCallingFormat())

mybucket = conn.get_bucket('your-bucket-name')
for key in mybucket.list():
for grant in key.get_acl().acl.grants :
if grant.permission == 'READ' :
print ("PUBLIC: " +str(key))
#key.set_acl('private')

Labels: aws, boto

posted by shantanu # Sunday, December 17, 2017 0 comments

December 07, 2017

Docker restart problems

If you restart server or if docker ends abnormally like a

kill -9 {DOCKER_PID}

then you may get an error while restarting your containers.

# docker restart 2dc3fc6e5e3e d6d9d1dab040

Error response from daemon: Cannot restart container 2dc3fc6e5e3e: oci runtime error: container with id exists: 2dc3fc6e5e3e5b63c9d3ad8074972b72867b9ccd250b4c7fced42c616adc2070
Error response from daemon: Cannot restart container d6d9d1dab040: oci runtime error: container with id exists: d6d9d1dab0407706ef4ec37d0bacfe43134054ddd0b7a06d9b97434d0c288564

The solution is to remove containers from runc and containerd.
# rm -rf /run/runc/80768bc717f353484ab54b306bca0506861688d0b1ae0f3d724208cb37cad047
# rm -rf /run/containerd/80768bc717f353484ab54b306bca0506861688d0b1ae0f3d724208cb37cad047
# rm -rf /run/runc/2dc3fc6e5e3e5b63c9d3ad8074972b72867b9ccd250b4c7fced42c616adc2070
# rm -rf /run/containerd/2dc3fc6e5e3e5b63c9d3ad8074972b72867b9ccd250b4c7fced42c616adc2070

Labels: docker

posted by shantanu # Thursday, December 07, 2017 0 comments

binder to host python notebooks for free and serverless

You can easily build ipython notebook environment using binder.

1) Visit binder page:
https://beta.mybinder.org

2) Type Github repo or URL:
https://github.com/psychemedia/showntell

3) Git branch:
maths

Click on launch. It will generate a ready-to-use environment that you can immediately start working on. select OpenLearn_Geometry.ipynb file and then select "show codecell inputs" button to show hidden cells.
_____

If you are using third-party modules in your code then you will need requirements.txt file in your repo with the names of all modules required to run your code. For e.g.

pandas
geopandas

If you need to execute certain commands after installing the modules, you will also need postBuild file. The contents of the file will look like this...

https://github.com/psychemedia/showntell/blob/maths/postBuild

Labels: git, pandas, python, usability

posted by shantanu # Thursday, December 07, 2017 0 comments

December 03, 2017

file details in pandas dataframe

Here is the code that will list all files in /home/ folder and create a nice data-frame.

import pandas as pd
from pathlib import Path
import time

p = Path(".")
all_files = []
for i in p.rglob('*.*'):
all_files.append((i.name, i.parent, time.ctime(i.stat().st_ctime), i.stat()[6]))

columns = ["File_Name", "Parent", "Created", "size"]
df = pd.DataFrame.from_records(all_files, columns=columns)

df.to_csv('file_list.csv', sep='\t')

It is easy to export it to excel, but I will prefer not to do that and continue working within ipython environment.

Labels: pandas, python

posted by shantanu # Sunday, December 03, 2017 0 comments

Analyze chrome history using pandas

You can download and install chrome extension to download the history in json format for free.

#https://chrome.google.com/webstore/detail/hcohnnbbiggngobheobhdipbgmcbelhh/publish-accepted

The json file can be imported in pandas dataframe. You will need to change the epoch time to readable date-time and also find the domain names visited most.

import pandas as pd
from urllib.parse import urlparse
df=pd.read_json('history.json')
df['date'] = pd.to_datetime(df['lastVisitTime'],unit='ms' )

def extract(myurl):
return urlparse(myurl).netloc

df['newurl']=df.url.apply(extract)
df.newurl.value_counts()

Labels: pandas, python

posted by shantanu # Sunday, December 03, 2017 0 comments

Archives

June 2001 July 2001 January 2003 May 2003 September 2003 October 2003 December 2003 January 2004 February 2004 March 2004 April 2004 May 2004 June 2004 July 2004 August 2004 September 2004 October 2004 November 2004 December 2004 January 2005 February 2005 March 2005 April 2005 May 2005 June 2005 July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 August 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 June 2007 July 2007 August 2007 September 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008 July 2008 August 2008 September 2008 October 2008 November 2008 December 2008 January 2009 February 2009 March 2009 April 2009 May 2009 June 2009 July 2009 August 2009 September 2009 October 2009 November 2009 December 2009 January 2010 February 2010 March 2010 April 2010 May 2010 June 2010 July 2010 August 2010 September 2010 October 2010 November 2010 December 2010 January 2011 February 2011 March 2011 April 2011 May 2011 June 2011 July 2011 August 2011 September 2011 October 2011 November 2011 December 2011 January 2012 February 2012 March 2012 April 2012 May 2012 June 2012 July 2012 August 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 April 2013 May 2013 June 2013 July 2013 September 2013 October 2013 January 2014 March 2014 April 2014 May 2014 July 2014 August 2014 September 2014 October 2014 November 2014 December 2014 January 2015 February 2015 March 2015 April 2015 May 2015 June 2015 July 2015 August 2015 September 2015 January 2016 February 2016 March 2016 April 2016 May 2016 June 2016 July 2016 August 2016 September 2016 October 2016 November 2016 December 2016 January 2017 February 2017 April 2017 May 2017 June 2017 July 2017 August 2017 September 2017 October 2017 November 2017 December 2017 February 2018 March 2018 April 2018 May 2018 June 2018 July 2018 August 2018 September 2018 October 2018 November 2018 December 2018 January 2019 February 2019 March 2019 April 2019 May 2019 July 2019 August 2019 September 2019 October 2019 November 2019 December 2019 January 2020 February 2020 March 2020 April 2020 May 2020 July 2020 August 2020 September 2020 October 2020 December 2020 January 2021 April 2021 May 2021 July 2021 September 2021 March 2022 October 2022 November 2022 March 2023 April 2023 July 2023 September 2023 October 2023 November 2023