# Shantanu's Blog

Database Consultant

## How does elasticsearch work?

Let's assume we have 3 documents to be indexed in elastic.

d1: "This is the desert. There are no people in the desert. The Earth is large."

d2: "'Where are the people?' resumed the little prince at last. 'It's a little lonely in the desert…' ,' It is lonely when you're among people, too,' said the snake."

d3: " 'What makes the desert beautiful,' said the little prince, 'is that somewhere it hides a well' "
_____

Variables used:

λ : 0.1

1-λ : 0.9

tf("desert"): 4, total number of occurence of "desert" in the collection across all documents

Lc : 59, total number of tokens in collection

Mc("desert") = (4 + 1) / (59 + 1) = 5/60

Md("desert") = tf = 2/15

idf("desert") = log(total number of documents / number of documents in which keyword found) i.e. log(3/3) = log(1) = 0
_____

1) Classic:
Simplest form of similarity search
tf * idf
(2/15) * 0 = 0

2) BM25 similarity:
default of elasticsearch for TF/IDF based similarity that has built-in tf normalization

IDF * ((k + 1) * tf) / (k * (1.0 - b + b * (|d|/avgDl)) + tf)

3) Jelinek Mercer smoothing:
The actual formulae is:
log(1+ (1-λ) * Md / λ * Mc)

The values replaced:
(1 + (1-λ) * Md("desert") / λ * Mc("desert"))

translates to:
(1+ ((0.9) * (2/15)) / (0.1 * (5/60)))

returns:
15.4

and log is:
math.log(15.4)
2.7343675094195836

By increasing λ (lambda), we are increasing the importance of the collection model, and diminishing the importance of document model.
This is a good choice for longer queries.
_____

And this is how to test it:

https://gist.github.com/shantanuo/e203fb336ff0712f502c73a43cd85d75

Labels: ,

## Pushing notifications to slack channel

It is very easy to push notifications to slack using Linux command prompt. There are a lot of alerts those clutter our inbox. Why not to send it to a slack channel?

\$ pip install slack-cli
\$ /usr/local/bin/slack-cli -d random "Hello everyone! This is from slack api https://github.com/regisb/slack-cli"

Yes! One line to install the package and another line to send a message to any channel (For e.g. random) Shell scripts sending emails like "replication status", "backup status", "uptime status" can be configured to also post to slack channel.

Slack has supported curl and incoming webhooks from day one. But configuring curl command is a bit difficult task. This slack client open source python script has made it very easy to pipe data to slack in real time.

From the page api.slack.com select "custom-integrations" and then choose "legacy-tokens" to generate token that will be needed when you run the slack-cli command for the first time. Or visit this page...

https://api.slack.com/custom-integrations/legacy-tokens
_____

Here are a few sample commands to monitor mysql performance through slack's "notifications" channel:

tail /var/log/mysqld.log | /usr/local/bin/slack-cli -d notifications

find /var/lib/mysql/ -name "*.frm" -mtime 0  | /usr/local/bin/slack-cli -d notifications

/usr/local/bin/slack-cli -d notifications "`mysql  -e 'show databases'`"

Labels: ,