January 29, 2019


List recent bookmarks on medium

Here is a python script to list last 10 claps from medium blog. Change the @user to the actual username for e.g. @shantanuo

import requests
import json

def clean_json_response(response):
    return json.loads(response.text.split('])}while(1);
url = ''
response = requests.get(url)
response_dict = clean_json_response(response)
for i in response_dict['payload']['references']['Post'].keys():


This will create a list something like this...


The gist is available here...


January 18, 2019


Using kaggle command line with google colab

Download any file from kaggle using command line is easy...

pip install kaggle

echo '{"username":"shantanuo","key":"c90c207ab8d6c445c54f77c5d5dcdedbx"}' > /root/.kaggle/kaggle.json

kaggle competitions download -c cifar-10


If you are using Google co-lab...

Create the API token by visiting the “My Account” page on Kaggle.  This will download a kaggle.json file to your computer. Next, we need to upload this credential file to Colab:

from google.colab import files

Then we can install Kaggle API and save the credential file in the “.kaggle” directory.

!pip install -U -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/

Now we can download the dataset:

!kaggle datasets download -d uciml/pima-indians-diabetes-database

This dataset will be downloaded to your current working directory which is the “content” folder in Colab.  As files get deleted every time you restart your Colab session, it’s a good idea to save files in your Google Drive. You just need to mount the drive using below code and save there:

from google.colab import drive


January 15, 2019


Using Dask for text columns

Dask works really well when most of the columns are of numeric. But if I have a few columns with a lot of text (upto a few thousand characters) that includes special chars escaped by \ then dask does not work as expected. For e.g.

26546,"trans_sarotp-527817b4dbe92fc7","AARSAK","Dear User, TranID \"RNS2018113031387\" of Amount 200,Pending for Checking,01/12/2018-00:03:13 - CUSTOMER CARE",1,20181201000311

Pandas somehow reads this data, but there are cases when dask does not.

df = dd.read_csv('s3://mybucket/somefile.csv', error_bad_lines=False, header=None,
                 dtype=str, escapechar='\\',
                  encoding = "ISO-8859-1", engine='python',
                 storage_options = {'anon':True})

In order to use escapechar parameter I need to use "C" engine because "python" engine does not work. And "C" engine fails to parse some of the text columns, may be due to encoding issues. error_bad_lines is another parameter that does not work similar to pandas for obvious reasons.

There are cases when dask dataframe is able to read while dask distributed fails to read the same data. Overall dask seems to have very limited use-case. It is not a general purpose solution.


January 12, 2019


Using tensorflow hub

Here is less than 10 lines of code to train your model based on elmo. No need to import the module because it is hosted on tensorflow hub and can be used dynamically!

import tensorflow as tf
import tensorflow_hub as hub
url = ""
embed = hub.Module(url)
embeddings = embed(sentences, signature="default", as_dict=True)["default"]

with tf.Session() as sess:
  x =

And here is how to use the module on test strings for e.g. "slave".

from sklearn.metrics.pairwise import cosine_similarity

search_string = "slave" #@param {type:"string"}
results_returned = "3" #@param [1, 2, 3]

embeddings2 = embed(

with tf.Session() as sess:
  search_vect =
cosine_similarities = pd.Series(cosine_similarity(search_vect, x).flatten())
output =""
for i,j in cosine_similarities.nlargest(int(results_returned)).iteritems():
  output +='
  for i in sentences[i].split():
    if i.lower() in search_string:
      output += " "+str(i)+""
      output += " "+str(i)
  output += "


January 11, 2019


CNN and softmax

softmax is a type of normalization that amplifies the difference between numbers that makes it more distinct to understand especially in image processing.

import numpy as np
nums = np.array([4, 5, 6])

from sklearn import preprocessing
array([[0.45584231, 0.56980288, 0.68376346]])

def softmax(A): 
    expA = np.exp(A)
    return expA / expA.sum()

array([0.09003057, 0.24472847, 0.66524096])

after applying Kernel channel processing, softmax can reduce the number of features.


January 08, 2019


pandas case study 9

Is there any way to change the order of group by output?

For e.g. in this case, I will get "India" first. How do I change the order so that "USA" will be first followed by "India"?

myst="""India, 905034 , 19:44 
USA, 905094  , 19:33
Russia,  905154 ,   21:56

u_cols=['country', 'index', 'current_tm']

myf = StringIO(myst)
import pandas as pd

df = pd.read_csv(StringIO(myst), sep=',', names = u_cols)

India 905034
Russia 905154
USA 905094

from pandas.api.types import CategoricalDtype
cats_to_order = ["USA", "India", "Russia"]
covered_type = CategoricalDtype(categories=cats_to_order, ordered=True)

df['country'] = df['country'].astype(covered_type)

USA 905094
India 905034
Russia 905154

newspaper module for python

Here is a useful python module to scrap text from any website.

# install newspaper module
!pip install newspaper3k

from newspaper import Article
article = Article('')




