Shantanu's Blog

Database Consultant

January 07, 2021

SQL and PPL support by AWS elasticsearch (Open distro for ES)

AWS elasticsearch now supports standard SQL syntax. For the system admins, it also supports PPL (Pipe Processing Language). Here is an example of both:

select userAgent, eventID from newcwl where requestParameters.bucketName.keyword like 'web%' and (eventName.keyword like 'PutObject%' OR eventName.keyword like 'UploadPartCopy%' OR eventName.keyword like 'UploadPart%') ;

And this is PPL syntax:

search source=newcwl eventSource.keyword='s3.amazonaws.com' | where eventName.keyword like 'PutObject%' or eventName.keyword like 'UploadPart%' or eventName.keyword like 'UploadPartCopy%' | where requestParameters.bucketName.keyword like "web%" | fields userAgent, eventID

This is really a great feature. I was looking for something like this for years!

Labels: elastic, usability

posted by shantanu # Thursday, January 07, 2021 0 comments

December 13, 2020

get elastic data into pandas dataframe

This 2 lines of code will conncet to Elastic instance hosted in Amazon cloud and download the data from a given index. It will save the data as pandas data frame and also flatten the JSON data to look like an excel spreadsheet.

import eland as ed

df = ed.DataFrame("https://xxx.us-east-1.es.amazonaws.com:443", es_index_pattern="cwl-2020.12.13")

Labels: elastic, pandas

posted by shantanu # Sunday, December 13, 2020 0 comments

March 26, 2020

Packetbeat to elastic server hosted by AWS

These are the 5 steps to follow if you want to push the packet or metric beats to AWS elastic instance.

Make sure that your elastic instance has white-listed the IP address of the server where you are installing packetbeat.

1) Install

2) Configure

3) start the service

1) Installation of beats depends upon processor type + OS + if you are connecting to AWS!

arm + ubuntu + connect to aws

wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.11.0-arm64.deb

dpkg -i packetbeat-oss-7.11.0-arm64.deb

arm + ubuntu + connect to non-aws

wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-7.11.0-arm64.deb

dpkg -i packetbeat-7.11.0-arm64.deb

arm + redhat + connect to aws

wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.11.0-aarch64.rpm

rpm -ivh packetbeat-7.11.0-aarch64.rpm

arm + redhat + connect to non-AWS

wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.11.0-aarch64.rpm

rpm -ivh packetbeat-7.11.0-aarch64.rpm

Intel + ubuntu + connect to aws

wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.11.0-amd64.deb

dpkg -i packetbeat-oss-7.11.0-amd64.deb

Intel + ubuntu + connect to non-aws

wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-7.11.0-amd64.deb

dpkg -i packetbeat-7.11.0-amd64.deb

Intel + redhat + connect to aws

wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.11.0-x86_64.rpm

rpm -ivh packetbeat-oss-7.11.0-x86_64.rpm

Intel + redhat + connect to non-aws

wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-7.11.0-x86_64.rpm

rpm -ivh packetbeat-7.11.0-x86_64.rpm

2) Open config file and change 2 settings as shown below:
vi /etc/packetbeat/packetbeat.yml
hosts: ["search-audit-ay52ddaf7q7zudixfmluji4osmx.us-east-1.es.amazonaws.com:443"]
protocol: "https"

If you are using a server from local network, mention the address in hosts list. The protocol option can be disabled if you are not using secure http connection.

Optionally, enable "send_response" parameter to capture the query and it's output as shown below:

packetbeat.protocols.mysql:
ports: [3306]
send_response: true
max_row_length: 4000

max row length option will allow you to capture lengthy queries and their large output.

disable xpack if connecting to AWS or using Open Source License

# Set to true to enable the monitoring reporter.

#monitoring.enabled: false

setup.ilm.enabled: false

setup.pack.security.enabled: false

setup.xpack.graph.enabled: false

setup.xpack.watcher.enabled: false

setup.xpack.monitoring.enabled: false

setup.xpack.reporting.enabled: false

3) Start packetbeat service

cd /usr/bin/

./packetbeat -e -c packetbeat.yml

Labels: aws, elastic

posted by shantanu # Thursday, March 26, 2020 0 comments

snapshot and restore elastic data to S3

Here are steps to backup and restore elastic data from AWS elastic instance.

1) Create IAM role
2) Use the boto script to create snapshot repo
3) Use Kibana to take the actual snapshot

All the steps are explained here...

https://forums.aws.amazon.com/message.jspa?messageID=930345#930345

Here is step by step guidance for the same.

1) Create IAM role:
Use the following cloudformation template to create the role and also note the ARN of the role to be used in boto script.

https://github.com/shantanuo/cloudformation/blob/master/updated/esbck.yml

2) Run this boto script:
Change the Access and Secret key. Also change the Elastic endpoint. Add the ARN created in the first step.

from boto.connection import AWSAuthConnection

class ESConnection(AWSAuthConnection):
def __init__(self, region, **kwargs):
super(ESConnection, self).__init__(**kwargs)
self._set_auth_region_name(region)
self._set_auth_service_name("es")

def _required_auth_capability(self):
return ["hmac-v4"]

client = ESConnection(
region="us-east-1",
host="search-training-foz7enh73fbg6lof23z7kbtn3y.us-east-1.es.amazonaws.com",
aws_access_key_id="xxx",
aws_secret_access_key="xxx",
is_secure=False,)

headers = {"Content-Type": "application/json"}
resp = client.make_request(
method="PUT",
headers=headers,
path="/_snapshot/esbck-essnapshotbucket-c9e6d7fy1cbt",
data='{"type": "s3","settings": { "bucket": "esbck-essnapshotbucket-c9e6d7fy1cbt","region": "us-east-1", "role_arn": "arn:aws:iam::1234567890:role/esbck-EsSnapshotRole-GJGMPH4DBMM3"}}')

resp.read()

3) Take the backup from kibana

PUT /_snapshot/esbck-essnapshotbucket-c9e6d7fy1cbt/snapshot_1

GET /_cat/indices

DELETE /cwl-2020.03.26

POST /_snapshot/esbck-essnapshotbucket-c9e6d7fy1cbt/snapshot_1/_restore

GET /_snapshot/

GET /_cat/snapshots/esbck-essnapshotbucket-c9e6d7fy1cbt

Labels: aws, elastic

posted by shantanu # Thursday, March 26, 2020 0 comments

March 07, 2020

Using wildcards instead of match_phrase

Instead of using "match_phrase" clause, I can use "wildcard" clause to use regular expression to replace the sub-domains like config and cloudtrail with star *

This will save typing additional clauses and keep the query short.

Old query:

{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"sourceIPAddress": "1.2.3.4"
}
},
{
"match_phrase": {
"sourceIPAddress": "5.6.7.8"
}
},
{
"match_phrase": {
"sourceIPAddress": "config.amazonaws.com"
}
},
{
"match_phrase": {
"sourceIPAddress": "cloudtrail.amazonaws.com"
}
}
],
"minimum_should_match": 1
}
}
}
_____

New Improved query:

{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"sourceIPAddress": "1.2.3.4"
}
},
{
"match_phrase": {
"sourceIPAddress": "5.6.7.8"
}
},
{
"wildcard": {
"sourceIPAddress.keyword": {
"value": "*\\.amazonaws\\.com*"
}
}
}
],
"minimum_should_match": 1
}
}
}

Labels: elastic

posted by shantanu # Saturday, March 07, 2020 0 comments

November 24, 2019

sentiment analysis using twitter data

Here is a nice tutorial with cloudformation template to enable twitter sentiment analysis.

https://aws.amazon.com/blogs/machine-learning/exploring-images-on-social-media-using-amazon-rekognition-and-amazon-athena/

and here is another blog post to read text from images...

https://aws.amazon.com/blogs/machine-learning/building-an-nlp-powered-search-index-with-amazon-textract-and-amazon-comprehend/

If you want to import the data in elasticsearch instead of Athena as suggested in the article, you will need to add following code to lambda funciton.

import urllib3
http = urllib3.PoolManager()

host_senti = 'https://search-xxx.us-east-1.es.amazonaws.com/mysentiments/senti/'
host_enti = 'https://search-xxx.us-east-1.es.amazonaws.com/myentities/enti/'
host_rekon = 'https://search-xxx.us-east-1.es.amazonaws.com/myrekognitions/rekon/'

Add http.request method after firehose.put_record method.

firehose.put_record(DeliveryStreamName=sentiment_stream, Record= { 'Data' :json.dumps(sentiment_record) + '\n'})
http.request('POST', host_senti+str(tweet['id']), headers = {'Content-Type': 'application/json'}, body = json.dumps(sentiment_record) )

firehose.put_record(DeliveryStreamName=entity_stream, Record= { 'Data' : json.dumps(entity_record) + '\n'} )
http.request('POST', host_enti+str(tweet['id']), headers = {'Content-Type': 'application/json'}, body = json.dumps(entity_record) )

firehose.put_record(DeliveryStreamName=rekognition_stream, Record= { 'Data' :json.dumps(image_rekognition_record) + '\n'})
http.request('POST', host_rekon+str(tweet['id']), headers = {'Content-Type': 'application/json'}, body = json.dumps(image_rekognition_record) )

Labels: aws, aws_lambda, elastic

posted by shantanu # Sunday, November 24, 2019 0 comments

November 22, 2019

elastic indexing using python built-in module

Here is another way of inserting a document into elasticsearch database. This is same as using requests module or curl. The only difference is that it uses built-in module urllib3 that is part of AWS lambda

import json
import urllib3
http = urllib3.PoolManager()

host = 'https://search-test-xxx.us-east-1.es.amazonaws.com/myyl2/myt/myid'
some_data_structure={"test": "this is one doc"}

def lambda_handler(event, context):
http.request('POST', host, headers = {'Content-Type': 'application/json'}, body = json.dumps(some_data_structure) )

Labels: aws, aws_lambda, elastic

posted by shantanu # Friday, November 22, 2019 0 comments

November 19, 2019

Packetbeat with geoip

1) Start elastic docker container
docker run --disable-content-trust --name myelastic -p 9200:9200 -p 5601:5601 -d shantanuo/elastic

2) Login to elastic container and install geoip plugin

docker exec -it myelastic sh

sh /home/elasticsearch/elasticsearch/bin/elasticsearch-plugin install ingest-geoip

3) Restart elastic container
docker restart myelastic

4) Create pipeline
PUT /_ingest/pipeline/geoip-info?pretty
{
"description": "Add geoip info",
"processors": [
{
"geoip": {
"field": "client.ip",
"target_field": "client.geo",
"ignore_missing": true
}
},
{
"geoip": {
"field": "source.ip",
"target_field": "source.geo",
"ignore_missing": true
}
},
{
"geoip": {
"field": "destination.ip",
"target_field": "destination.geo",
"ignore_missing": true
}
},
{
"geoip": {
"field": "server.ip",
"target_field": "server.geo",
"ignore_missing": true
}
},
{
"geoip": {
"field": "host.ip",
"target_field": "host.geo",
"ignore_missing": true
}
}
]
}

5) Download config file
curl -L -O https://raw.githubusercontent.com/elastic/beats/master/deploy/docker/packetbeat.docker.yml

6) Modify config file by adding pipeline parameter to output section

output.elasticsearch:
pipeline: geoip-info

7) Start packetbeat container
docker run -d \
--disable-content-trust \
--name=packetbeaten2 \
--user=packetbeat \
--volume="$(pwd)/packetbeat.docker.yml:/usr/share/packetbeat/packetbeat.yml:ro" \
--cap-add="NET_RAW" \
--cap-add="NET_ADMIN" \
--network=host \
--entrypoint=bash \
-it docker.elastic.co/beats/packetbeat-oss:7.0.0

8) Login to packetbeat conatiner and start packetbeat service
docker exec -it packetbeaten2 bash

./packetbeat --strict.perms=false -e \
-E output.elasticsearch.hosts=["172.31.12.85:9200"] -e
-E output.elasticsearch.pipeline=geoip-info -e

Labels: elastic, usability

posted by shantanu # Tuesday, November 19, 2019 0 comments

October 16, 2019

Create elastic instance using template

This cloudformation template will create an elastic instance that can be used for testing.
It will not have a password and will be accessible from any IP address. It means unsecure but good enough for testing with dummy data.

Resources:
ElasticsearchDomain:
Type: AWS::Elasticsearch::Domain
Properties:
DomainName: "testes"
ElasticsearchClusterConfig:
InstanceCount: "1"
InstanceType: "m3.medium.elasticsearch"
AccessPolicies:
Statement:
-
Effect: "Allow"
Principal:
AWS: "*"
Action: "es:*"
Resource: "arn:aws:es:us-east-1:${AWS::AccountId}:domain/testes/*"

Labels: aws, aws_cloudformation, elastic, usability

posted by shantanu # Wednesday, October 16, 2019 0 comments

October 01, 2019

Using embeddings for similarity search

Let’s suppose we have a large collection of questions and answers. A user can ask a question, and we want to retrieve the most similar question in our collection to help them find an answer.

* "zipping up files" should return "Compressing / Decompressing Folders & Files"
* "determine if something is an IP" should return "How do you tell whether a string is an IP or a hostname"
* "translate bytes to doubles" should return "Convert Bytes to Floating Point Numbers in Python"

Because that is the closest entry of all.

The indexed document should look like this...

{'user': '5156',
'tags': ['xcode', 'git', 'osx', 'version-control', 'gitignore'],
'questionId': '49478',
'creationDate': '2008-09-08T11:07:49.953',
'title': 'Git ignore file for Xcode projects',
'acceptedAnswerId': '12021580',
'type': 'question',
'body': 'Which files should I include in .gitignore when using Git in conjunction with Xcode? ',
'title_vector': [0.031643908470869064,
-0.04750939458608627,
-0.04847564920783043,
...

0.001153663732111454,
0.04351674020290375]}

The title_vector has exactly 512 elements in it for all records irrespective of the number of words in the title. This is because we are using google tensorflow "Universal Sentence Encoder".

https://tfhub.dev/google/universal-sentence-encoder/2

Here is the article on this topic:
https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch

And github repo:
https://github.com/jtibshirani/text-embeddings

If you want to test the application:

docker run --name text_embeddings -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -d shantanuo/textembeddings
docker exec -it text_embeddings bash
cd text-embeddings/
python3.6 src/main.py

Labels: elastic, machine_learning

posted by shantanu # Tuesday, October 01, 2019 0 comments

September 30, 2019

Create pandas dataframe using elastic beats data

Here are 5 steps to create a pandas dataframe using the packetbeat data

1) Start elastic container
2) Download and configure packetbeat config file
3) Start packetbeat container
4) Login to packetbeat container and start service
5) Import packetbeat data into pandas dataframe
_____

# start elastic
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --log-driver json-file -d elasticsearch:7.3.1

# download packetbeat config file
curl -L -O https://raw.githubusercontent.com/elastic/beats/7.0/deploy/docker/packetbeat.docker.yml

# add send_response parameter to mysql section of config file and change host address
vi packetbeat.docker.yml

packetbeat.protocols.mysql:
ports: [3306]
send_response: true

output.elasticsearch:
hosts: 'some_site.com:9200'

# start a packetbeat container

docker run \
--name pbeat \
--disable-content-trust \
--log-driver json-file \
--user=packetbeat \
--volume="$(pwd)/packetbeat.docker.yml:/usr/share/packetbeat/packetbeat.yml:ro" \
--cap-add="NET_RAW" \
--cap-add="NET_ADMIN" \
--network=host \
-d docker.elastic.co/beats/packetbeat:7.0.0

# login to packetbeat container and start packetbeat service
docker exec -it pbeat bash
cd /usr/share/packetbeat/
./packetbeat -e

# download the packetbeat data to pandas dataframe:

import pandas as pd
import numpy as np

import elasticsearch
from elasticsearch import helpers

es_client = elasticsearch.Elasticsearch("http://some_site.com:9200")

mylist = list()

for r in helpers.scan(es_client, index="packetbeat-7.0.0-2019.10.01-000001"):
try:
mylist.append((r["_source"]["client"]))
except:
mylist.append(({"ip": np.nan, "port": np.nan, "bytes": np.nan}))

df = pd.DataFrame(mylist)

Labels: elastic, pandas

posted by shantanu # Monday, September 30, 2019 0 comments

April 30, 2019

elastic watcher alerts

Here is the code to set email alerts when an event occur.

# install watcher plugin

bin/plugin install elasticsearch/license/latest
bin/plugin install elasticsearch/watcher/latest

# add email section to elasticsearch.yml file

watcher.actions.email.service.account:
gmail_account:
profile: gmail
smtp:
auth: true
starttls.enable: true
host: smtp.gmail.com
port: 587
user: shantanu.XXX
password: XXX

# re-start elasticsearch and add a watcher document

curl -XPUT 'http://1.2.3.4:9200/_watcher/watch/log_event_watch' -d '{
"metadata" : {
"color" : "red"
},
"trigger" : {
"schedule" : {
"interval" : "5m"
}
},
"input" : {
"search" : {
"request" : {
"indices" : "packet*",
"body" : {
"size" : 0,
"query" : { "match" : { "status" : "error" } }
}
}
}
},
"condition" : {
"script" : "return ctx.payload.hits.total > 5"
},
"actions" : {
"email_administrator" : {
"throttle_period": "15m",
"email" : {
"to" : "shantanu1234@gmail.com",
"subject" : "Encountered {{ctx.payload.hits.total}} errors",
"body" : "Too many error in the system, see attached data",
"attachments" : {
"attached_data" : {
"data" : {
"format" : "json"
}
}
},
"priority" : "high"
}

}
}
}'

Labels: elastic, usability

posted by shantanu # Tuesday, April 30, 2019 0 comments

April 28, 2019

MySQL error log using Elastic stack

Here are 3 easy steps to enable mysql failed query log.
1) Download packetbeat config file
2) Edit config file to add "send_response" parameter
3) Start docker container

# download packetbeat config file
curl -L -O https://raw.githubusercontent.com/elastic/beats/7.0/deploy/docker/packetbeat.docker.yml

# add send_response parameter to mysql
packetbeat.protocols.mysql:
ports: [3306]
send_response: true

# Set to true to enable the monitoring reporter.

#monitoring.enabled: false

setup.ilm.enabled: false

setup.pack.security.enabled: false

setup.xpack.graph.enabled: false

setup.xpack.watcher.enabled: false

setup.xpack.monitoring.enabled: false

setup.xpack.reporting.enabled: false

# start docker container
docker run \
--user=packetbeat \
--volume="$(pwd)/packetbeat.docker.yml:/usr/share/packetbeat/packetbeat.yml:ro" \
--cap-add="NET_RAW" \
--cap-add="NET_ADMIN" \
--network=host \
-d docker.elastic.co/beats/packetbeat:7.0.0 \
--strict.perms=false -e \
-E cloud.id=XXX \
-E cloud.auth=elastic:XXX

# Once you get logs in Kibana, use a filter type:mysql and status:Error to extract failing queries.

Labels: elastic, mysql, usability

posted by shantanu # Sunday, April 28, 2019 0 comments

April 28, 2018

How does elasticsearch work?

Let's assume we have 3 documents to be indexed in elastic.

d1: "This is the desert. There are no people in the desert. The Earth is large."

d2: "'Where are the people?' resumed the little prince at last. 'It's a little lonely in the desert…' ,' It is lonely when you're among people, too,' said the snake."

d3: " 'What makes the desert beautiful,' said the little prince, 'is that somewhere it hides a well' "
_____

Variables used:

λ : 0.1

1-λ : 0.9

tf("desert"): 4, total number of occurence of "desert" in the collection across all documents

Lc : 59, total number of tokens in collection

Mc("desert") = (4 + 1) / (59 + 1) = 5/60

Md("desert") = tf = 2/15

idf("desert") = log(total number of documents / number of documents in which keyword found) i.e. log(3/3) = log(1) = 0
_____

1) Classic:
Simplest form of similarity search
tf * idf
(2/15) * 0 = 0

2) BM25 similarity:
default of elasticsearch for TF/IDF based similarity that has built-in tf normalization

IDF * ((k + 1) * tf) / (k * (1.0 - b + b * (|d|/avgDl)) + tf)

3) Jelinek Mercer smoothing:
The actual formulae is:
log(1+ (1-λ) * Md / λ * Mc)

The values replaced:
(1 + (1-λ) * Md("desert") / λ * Mc("desert"))

translates to:
(1+ ((0.9) * (2/15)) / (0.1 * (5/60)))

returns:
15.4

and log is:
math.log(15.4)
2.7343675094195836

By increasing λ (lambda), we are increasing the importance of the collection model, and diminishing the importance of document model.
This is a good choice for longer queries.
_____

And this is how to test it:

https://gist.github.com/shantanuo/e203fb336ff0712f502c73a43cd85d75

Labels: elastic, machine_learning

posted by shantanu # Saturday, April 28, 2018 0 comments

February 25, 2018

Install and configure packetbeat to monitor mysql traffic

1) Install packetbeat
deb:
sudo apt-get install libpcap0.8
curl -L -O https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-6.2.2-amd64.deb
sudo dpkg -i packetbeat-6.2.2-amd64.deb

rpm:
sudo yum install libpcap
curl -L -O https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-6.2.2-x86_64.rpm
sudo rpm -vi packetbeat-6.2.2-x86_64.rpm

2) Make sure that "query" property in "mysql" section is "text" and not "keyword".

[root@localhost packetbeat]# vi packetbeat.template-es6x.json

"mysql": {
"properties": {
"affected_rows": {
"type": "long"
},
"query": {
"type": "text"
}
}
},
"nfs": {
"properties": {
"minor_version": {

3) Change the host, protocol and password in elasticsearch output secion of config file. Enable template overwriting and make sure version 6x will be loaded.

[root@localhost packetbeat]# vi packetbeat.yml

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["https://944fe807b7525eaf163f502e08a412c5.us-east-1.aws.found.io:9243"]
# Optional protocol and basic auth credentials.
protocol: "https"
username: "elastic"
password: "xxx"

# Set to false to disable template loading.
template.enabled: true

# Template name. By default the template name is packetbeat.
template.name: "packetbeat"

# Path to template file
template.path: "${path.config}/packetbeat.template.json"

# Overwrite existing template
template.overwrite: true

# If set to true, packetbeat checks the Elasticsearch version at connect time, and if it
# is 2.x, it loads the file specified by the template.versions.2x.path setting. The
# default is true.
template.versions.2x.enabled: false

# If set to true, packetbeat checks the Elasticsearch version at connect time, and if it
# is 6.x, it loads the file specified by the template.versions.6x.path setting. The
# default is true.
template.versions.6x.enabled: true

# Path to the Elasticsearch 6.x version of the template file.
template.versions.6x.path: "${path.config}/packetbeat.template-es6x.json"

4) Check the logs that everything is being loaded correctly.

[root@localhost packetbeat]# cat /var/log/packetbeat/packetbeat| more
2018-02-25T11:53:30+05:30 INFO Metrics logging every 30s
2018-02-25T11:53:30+05:30 INFO Loading template enabled for Elasticsearch 6.x. Reading template file: /etc/packetbeat/packetbeat.template-es6x.json
2018-02-25T11:53:30+05:30 INFO Elasticsearch url: https://944fe807b7525eaf163f502e08a412c.us-east-1.aws.found.io:9243
2018-02-25T11:53:30+05:30 INFO Activated elasticsearch as output plugin.
2018-02-25T11:53:30+05:30 INFO Publisher name: localhost.localdomain
2018-02-25T11:53:30+05:30 INFO Flush Interval set to: 1s
2018-02-25T11:53:30+05:30 INFO Max Bulk Size set to: 50
2018-02-25T11:53:30+05:30 INFO Process matching disabled
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: amqp
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: mongodb
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: mysql
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: nfs
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: pgsql
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: thrift
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: cassandra
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: dns
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: http
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: memcache
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: redis
2018-02-25T11:53:30+05:30 INFO packetbeat start running.
2018-02-25T11:53:32+05:30 INFO Connected to Elasticsearch version 6.2.2
2018-02-25T11:53:32+05:30 INFO Trying to load template for client: https://944fe807b7525eaf163f502e08a412c.us-east-1.aws.found.io:9243
2018-02-25T11:53:32+05:30 INFO Existing template will be overwritten, as overwrite is enabled.
2018-02-25T11:53:32+05:30 INFO Detected Elasticsearch 6.x. Automatically selecting the 6.x version of the template
2018-02-25T11:53:33+05:30 INFO Elasticsearch template with name 'packetbeat' loaded

_____

Or use docker image:

[root@localhost ~]# docker run --cap-add=NET_ADMIN --network=host -e HOST="https://944fe807b7525eaf163f502e08a412c5.us-east-1.aws.found.io:9243" -e PASS="rzmYYJUdHVaglRejr8XqjIX7" shantanuo/packetbeat-agent

_____

# curl commands to connect to secure elastic (cloud)
curl --user "elastic:passwd" https://xxx.us-east-1.aws.found.io:9243/_aliases

curl --user "elastic:passwd" https://xxx.us-east-1.aws.found.io:9243/_cat/indices/

curl --user "elastic:passwd" https://xxx.us-east-1.aws.found.io:9243/packetbeat-6.6.2-2019.03.26/_search?pretty=true&q=*:*

Labels: docker, elastic, usability

posted by shantanu # Sunday, February 25, 2018 0 comments

February 11, 2018

Backup elastic data to S3

Here are the 5 steps to take backup of elastic index

1) Install s3 plugin
2) Set access and secret key
3) create repo
4) Take backup
5) Test if backup was successful

cd /home/elasticsearch/elasticsearch/bin/

# sh elasticsearch-plugin install repository-s3

# sh elasticsearch-keystore create s3.client.default.access_key
Created elasticsearch keystore in /home/elasticsearch/elasticsearch/config

# sh elasticsearch-keystore add s3.client.default.access_key
Enter value for s3.client.default.access_key:

# sh elasticsearch-keystore add s3.client.default.secret_key
Enter value for s3.client.default.secret_key:

# curl -XPUT "http://localhost:9200/_snapshot/my_s3_repository1" -H'Content-Type: application/json' -d'
{
"type": "s3",
"settings": {
"bucket": "todel162"
}
}'

# curl -XPUT "http://localhost:9200/_snapshot/my_s3_repository1/snap2" -H'Content-Type: application/json' -d'
{
"indices": "products, index_1, index_2",
"ignore_unavailable": true,
"include_global_state": false
}'

{"accepted":true}

# curl http://localhost:9200/_cat/snapshots/my_s3_repository?v

id status start_epoch start_time end_epoch end_time duration indices successful_shards failed_shards total_shards
snap1 SUCCESS 1518343272 10:01:12 1518343272 10:01:12 52ms 0 0 0 0
snap2 SUCCESS 1518343818 10:10:18 1518343818 10:10:18 64ms 0 0 0 0

Labels: aws, elastic, usability

posted by shantanu # Sunday, February 11, 2018 0 comments

June 29, 2017

Amazon User Activity Log

Here is a good article about how to integrate cloud trail to elastic search.

https://blog.powerupcloud.com/visualizing-aws-cloudtrail-logs-using-elk-stack-e3d3b399af43

Once I follow all these steps, I can log all Amazon activities into Elastic. This is super useful to know what changes are taking place in my account.

Labels: aws, aws_lambda, elastic, usability

posted by shantanu # Thursday, June 29, 2017 0 comments

June 04, 2017

Packetbeat dashboards

In order to import sample kibana dashboards, we need to install packetbeat first. And then install the sample dashboards to elastic server.

curl -L -O https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-5.4.1-x86_64.rpm

sudo rpm -vi packetbeat-5.4.1-x86_64.rpm

cd /usr/share/packetbeat/

./scripts/import_dashboards -es http://localhost:9200

Labels: elastic, usability

posted by shantanu # Sunday, June 04, 2017 0 comments

June 03, 2017

Frequently used docker containers

Here are 3 containers those I need most of the times.

1) elastic and kibana

a) elastic, kibana and packetbeat

docker run --disable-content-trust -p 9200:9200 -p 5601:5601 -d nshou/elasticsearch-kibana

docker run --cap-add=NET_ADMIN --net=host -e KIBANA="http://shantanuoak.com:5601" -e HOST="http://shantanuoak.com:9200" shantanuo/packetbeat-agent-unsecure

b) Connect to elastic hub:

docker run --cap-add=NET_ADMIN --network=host -e KIBANA="https://6a16d771c4fc3be7f251c7c629a421e2.us-east-1.aws.found.io:9243" -e HOST="https://d322f42d01dc50c50dba0b446e6a1c0a.us-east-1.aws.found.io:9243" -e PASS="pwkbZXIB3VMPtr4wOnpLNi8c" shantanuo/packetbeat-agent

c) get the IP of elastic using command hostname -i and then install metric-beat dashboard using docker

docker run docker.elastic.co/beats/metricbeat:5.5.0 ./scripts/import_dashboards -es http://172.31.73.228:9200

2) python pandas using miniconda

docker run -i -t -p 8888:8888 -v /tmp:/tmp continuumio/miniconda3 /bin/bash -c "/opt/conda/bin/conda install jupyter -y && cd /tmp/ && /opt/conda/bin/jupyter notebook --NotebookApp.token='india' --notebook-dir=/tmp --ip='0.0.0.0' --port=8888 --no-browser --allow-root"

3) mysql fixed bug and added IST timezone

docker run -p 3399:3306 -e MYSQL_ROOT_PASSWORD=india3399 -v /my/custom3399:/etc/mysql/conf.d -v /storage/mysql/datadir3399:/var/lib/mysql -d shantanuo/mysql:5.7

This container uses the config file as shown below:

# vi /my/custom3399/my.cnf
[mysqld]
server-id=1723169137

max_binlog_size=1024M
expire_logs_days=40
binlog_format=ROW
binlog_checksum=NONE

### enable master
# log-bin=/var/log/mysql/mysql-bin.log

### myisam only
# skip-innodb
# default-storage-engine=MyISAM
# default_tmp_storage_engine=MyISAM
# key-buffer-size=1G
# myisam_max_sort_file_size=40G
# myisam_sort_buffer_size=512M
# bulk_insert_buffer_size=1G
### disable strict sql mode
# sql-mode=''

# secure-file-priv = ""

### innodb setting
# innodb_buffer_pool_size=1G
# innodb_log_file_size=512M

# innodb_flush_method=O_DIRECT
# innodb_file_per_table
# innodb-flush-log-at-trx-commit = 2

# make sure temp directory has sufficient space
# tmpdir=/

4) Adminer container to manage mysql

docker run -p 80:80 -d shantanuo/adminer /bin/bash -c "/usr/sbin/apache2ctl -D FOREGROUND "

Labels: docker, elastic, usability

posted by shantanu # Saturday, June 03, 2017 0 comments

November 01, 2016

sysdig for system admins

What about a tool for sys admin that has all the utilties those we use everyday?
sysdig is a combination of strace + tcpdump + htop + iftop + lsof + transaction tracing

It is an open source system-level exploration tool that captures system state and activity.

Here is how to install it...

curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash

And here are a few examples

Dump system activity to file,
sysdig -w trace.scap

Show all the interactive commands executed inside a given container.
sysdig -pc -c spy_users container.name=wordpress1

View the top network connections for a single container.
sysdig -pc -c topconns container.name=wordpress1

See all the GET HTTP requests made by the machine
sudo sysdig -s 2000 -A -c echo_fds fd.port=80 and evt.buffer contains GET

See all the SQL select queries made by the machine
sudo sysdig -s 2000 -A -c echo_fds evt.buffer contains SELECT

See queries made via apache to an external MySQL server happening in real time
sysdig -s 2000 -A -c echo_fds fd.sip=192.168.30.5 and proc.name=apache2 and evt.buffer contains SELECT

More examples can be found here..

http://www.sysdig.org/wiki/sysdig-examples/#application

Labels: docker, elastic, linux tips

posted by shantanu # Tuesday, November 01, 2016 0 comments

Shantanu's Blog

January 07, 2021

SQL and PPL support by AWS elasticsearch (Open distro for ES)

December 13, 2020

get elastic data into pandas dataframe

March 26, 2020

Packetbeat to elastic server hosted by AWS

snapshot and restore elastic data to S3

March 07, 2020

Using wildcards instead of match_phrase

November 24, 2019

sentiment analysis using twitter data

November 22, 2019

elastic indexing using python built-in module

November 19, 2019

Packetbeat with geoip

October 16, 2019

Create elastic instance using template

October 01, 2019

Using embeddings for similarity search

September 30, 2019

Create pandas dataframe using elastic beats data

April 30, 2019

elastic watcher alerts

April 28, 2019

MySQL error log using Elastic stack

April 28, 2018

How does elasticsearch work?

February 25, 2018

Install and configure packetbeat to monitor mysql traffic

February 11, 2018

Backup elastic data to S3

June 29, 2017

Amazon User Activity Log

June 04, 2017

Packetbeat dashboards

June 03, 2017

Frequently used docker containers

November 01, 2016

sysdig for system admins

Archives