Shantanu's Blog

Database Consultant

January 07, 2021

 

SQL and PPL support by AWS elasticsearch (Open distro for ES)

AWS elasticsearch now supports standard SQL syntax. For the system admins, it also supports PPL (Pipe Processing Language). Here is an example of both:

select userAgent, eventID from newcwl where requestParameters.bucketName.keyword like 'web%' and (eventName.keyword like 'PutObject%' OR eventName.keyword like 'UploadPartCopy%' OR eventName.keyword like 'UploadPart%') ;

And this is PPL syntax:

search source=newcwl eventSource.keyword='s3.amazonaws.com' | where eventName.keyword like 'PutObject%' or eventName.keyword like 'UploadPart%' or eventName.keyword like 'UploadPartCopy%' | where requestParameters.bucketName.keyword like "web%" | fields userAgent, eventID

This is really a great feature. I was looking for something like this for years!

Labels: ,


December 13, 2020

 

get elastic data into pandas dataframe

This 2 lines of code will conncet to Elastic instance hosted in Amazon cloud and download the data from a given index. It will save the data as pandas data frame and also flatten the JSON data to look like an excel spreadsheet.

import eland as ed

df = ed.DataFrame("https://xxx.us-east-1.es.amazonaws.com:443", es_index_pattern="cwl-2020.12.13")

Labels: ,


March 26, 2020

 

Packetbeat to elastic server hosted by AWS

These are the 5 steps to follow if you want to push the packet or metric beats to AWS elastic instance.

Make sure that your elastic instance has white-listed the IP address of the server where you are installing packetbeat.

1) Install
2) Configure
3) start the service

1) Installation of beats depends upon processor type + OS + if you are connecting to AWS!

arm + ubuntu + connect to aws
wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.11.0-arm64.deb
dpkg -i packetbeat-oss-7.11.0-arm64.deb

arm + ubuntu + connect to non-aws
wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-7.11.0-arm64.deb
dpkg -i packetbeat-7.11.0-arm64.deb

arm + redhat + connect to aws
wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.11.0-aarch64.rpm
rpm -ivh packetbeat-7.11.0-aarch64.rpm

arm + redhat + connect to non-AWS
wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.11.0-aarch64.rpm
rpm -ivh packetbeat-7.11.0-aarch64.rpm

Intel + ubuntu + connect to aws
wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.11.0-amd64.deb
dpkg -i packetbeat-oss-7.11.0-amd64.deb

Intel + ubuntu + connect to non-aws
wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-7.11.0-amd64.deb
dpkg -i packetbeat-7.11.0-amd64.deb

Intel + redhat + connect to aws
wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.11.0-x86_64.rpm
rpm -ivh packetbeat-oss-7.11.0-x86_64.rpm

Intel + redhat + connect to non-aws
wget https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-7.11.0-x86_64.rpm
rpm -ivh packetbeat-7.11.0-x86_64.rpm


2) Open config file and change 2 settings as shown below:
vi /etc/packetbeat/packetbeat.yml
hosts: ["search-audit-ay52ddaf7q7zudixfmluji4osmx.us-east-1.es.amazonaws.com:443"]
protocol: "https"

If you are using a server from local network, mention the address in hosts list. The protocol option can be disabled if you are not using secure http connection.

Optionally,  enable "send_response" parameter to capture the query and it's output as shown below:

packetbeat.protocols.mysql:
  ports: [3306]
  send_response: true
  max_row_length: 4000

max row length option will allow you to capture lengthy queries and their large output.

disable xpack if connecting to AWS or using Open Source License

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false
setup.ilm.enabled: false
setup.pack.security.enabled: false
setup.xpack.graph.enabled: false
setup.xpack.watcher.enabled: false
setup.xpack.monitoring.enabled: false
setup.xpack.reporting.enabled: false

3) Start packetbeat service
cd /usr/bin/
./packetbeat  -e -c packetbeat.yml

Labels: ,


 

snapshot and restore elastic data to S3

Here are steps to backup and restore elastic data from AWS elastic instance.

1) Create IAM role
2) Use the boto script to create snapshot repo
3) Use Kibana to take the actual snapshot

All the steps are explained here...

https://forums.aws.amazon.com/message.jspa?messageID=930345#930345

Here is step by step guidance for the same.

1) Create IAM role:
Use the following cloudformation template to create the role and also note the ARN of the role to be used in boto script.

https://github.com/shantanuo/cloudformation/blob/master/updated/esbck.yml

2) Run this boto script:
Change the Access and Secret key. Also change the Elastic endpoint. Add the ARN created in the first step.

from boto.connection import AWSAuthConnection

class ESConnection(AWSAuthConnection):
    def __init__(self, region, **kwargs):
        super(ESConnection, self).__init__(**kwargs)
        self._set_auth_region_name(region)
        self._set_auth_service_name("es")

    def _required_auth_capability(self):
        return ["hmac-v4"]

client = ESConnection(
    region="us-east-1",
    host="search-training-foz7enh73fbg6lof23z7kbtn3y.us-east-1.es.amazonaws.com",
    aws_access_key_id="xxx",
    aws_secret_access_key="xxx",
    is_secure=False,)

headers = {"Content-Type": "application/json"}
resp = client.make_request(
    method="PUT",
    headers=headers,
    path="/_snapshot/esbck-essnapshotbucket-c9e6d7fy1cbt",
    data='{"type": "s3","settings": { "bucket": "esbck-essnapshotbucket-c9e6d7fy1cbt","region": "us-east-1", "role_arn": "arn:aws:iam::1234567890:role/esbck-EsSnapshotRole-GJGMPH4DBMM3"}}')

resp.read()

3) Take the backup from kibana

PUT /_snapshot/esbck-essnapshotbucket-c9e6d7fy1cbt/snapshot_1

GET /_cat/indices

DELETE /cwl-2020.03.26

POST /_snapshot/esbck-essnapshotbucket-c9e6d7fy1cbt/snapshot_1/_restore

GET /_snapshot/

GET /_cat/snapshots/esbck-essnapshotbucket-c9e6d7fy1cbt

Labels: ,


March 07, 2020

 

Using wildcards instead of match_phrase

Instead of using "match_phrase" clause, I can use "wildcard" clause to use regular expression to replace the sub-domains like config and cloudtrail with star *

This will save typing additional clauses and keep the query short.

Old query:

{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "sourceIPAddress": "1.2.3.4"
          }
        },
        {
          "match_phrase": {
            "sourceIPAddress": "5.6.7.8"
          }
        },
        {
          "match_phrase": {
            "sourceIPAddress": "config.amazonaws.com"
          }
        },
        {
          "match_phrase": {
            "sourceIPAddress": "cloudtrail.amazonaws.com"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}
_____

New Improved query:

{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "sourceIPAddress": "1.2.3.4"
          }
        },
        {
          "match_phrase": {
            "sourceIPAddress": "5.6.7.8"
          }
        },
        {
          "wildcard": {
            "sourceIPAddress.keyword": {
              "value": "*\\.amazonaws\\.com*"
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

Labels:


November 24, 2019

 

sentiment analysis using twitter data

Here is a nice tutorial with cloudformation template to enable twitter sentiment analysis.

https://aws.amazon.com/blogs/machine-learning/exploring-images-on-social-media-using-amazon-rekognition-and-amazon-athena/

and here is another blog post to read text from images...

https://aws.amazon.com/blogs/machine-learning/building-an-nlp-powered-search-index-with-amazon-textract-and-amazon-comprehend/

If you want to import the data in elasticsearch instead of Athena as suggested in the article, you will need to add following code to lambda funciton.

import urllib3
http = urllib3.PoolManager()

host_senti = 'https://search-xxx.us-east-1.es.amazonaws.com/mysentiments/senti/'
host_enti = 'https://search-xxx.us-east-1.es.amazonaws.com/myentities/enti/'
host_rekon = 'https://search-xxx.us-east-1.es.amazonaws.com/myrekognitions/rekon/'

Add http.request method after firehose.put_record method.

        firehose.put_record(DeliveryStreamName=sentiment_stream, Record= { 'Data' :json.dumps(sentiment_record) + '\n'})
        http.request('POST', host_senti+str(tweet['id']), headers = {'Content-Type': 'application/json'}, body = json.dumps(sentiment_record) )

            firehose.put_record(DeliveryStreamName=entity_stream, Record= { 'Data' : json.dumps(entity_record) + '\n'} )
            http.request('POST', host_enti+str(tweet['id']), headers = {'Content-Type': 'application/json'}, body = json.dumps(entity_record) )

                firehose.put_record(DeliveryStreamName=rekognition_stream, Record= { 'Data' :json.dumps(image_rekognition_record) + '\n'})
                http.request('POST', host_rekon+str(tweet['id']), headers = {'Content-Type': 'application/json'}, body = json.dumps(image_rekognition_record) )

Labels: , ,


November 22, 2019

 

elastic indexing using python built-in module

Here is another way of inserting a document into elasticsearch database. This is same as using requests module or curl. The only difference is that it uses built-in module urllib3 that is part of AWS lambda

import json
import urllib3
http = urllib3.PoolManager()

host = 'https://search-test-xxx.us-east-1.es.amazonaws.com/myyl2/myt/myid'
some_data_structure={"test": "this is one doc"}

def lambda_handler(event, context):
  http.request('POST', host, headers = {'Content-Type': 'application/json'}, body = json.dumps(some_data_structure) )
 

Labels: , ,


November 19, 2019

 

Packetbeat with geoip

1) Start elastic docker container
docker run --disable-content-trust --name myelastic -p 9200:9200 -p 5601:5601 -d shantanuo/elastic

2) Login to elastic container and install geoip plugin

docker exec -it myelastic sh

sh /home/elasticsearch/elasticsearch/bin/elasticsearch-plugin install ingest-geoip

3) Restart elastic container
docker restart myelastic

4) Create pipeline
PUT /_ingest/pipeline/geoip-info?pretty
{
  "description": "Add geoip info",
  "processors": [
    {
      "geoip": {
        "field": "client.ip",
        "target_field": "client.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "source.ip",
        "target_field": "source.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "destination.ip",
        "target_field": "destination.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "server.ip",
        "target_field": "server.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "host.ip",
        "target_field": "host.geo",
        "ignore_missing": true
      }
    }
  ]
}

5) Download config file
curl -L -O https://raw.githubusercontent.com/elastic/beats/master/deploy/docker/packetbeat.docker.yml

6) Modify config file by adding pipeline parameter to output section

output.elasticsearch:
  pipeline: geoip-info

7) Start packetbeat container
docker run -d \
  --disable-content-trust \
  --name=packetbeaten2 \
  --user=packetbeat \
  --volume="$(pwd)/packetbeat.docker.yml:/usr/share/packetbeat/packetbeat.yml:ro" \
  --cap-add="NET_RAW" \
  --cap-add="NET_ADMIN" \
  --network=host \
  --entrypoint=bash \
  -it  docker.elastic.co/beats/packetbeat-oss:7.0.0

8) Login to packetbeat conatiner and start packetbeat service
docker exec -it packetbeaten2 bash

./packetbeat --strict.perms=false -e \
  -E output.elasticsearch.hosts=["172.31.12.85:9200"] -e
  -E output.elasticsearch.pipeline=geoip-info -e

Labels: ,


October 16, 2019

 

Create elastic instance using template

This cloudformation template will create an elastic instance that can be used for testing.
It will not have a password and will be accessible from any IP address. It means unsecure but good enough for testing with dummy data.

Resources:
  ElasticsearchDomain:
    Type: AWS::Elasticsearch::Domain
    Properties:
      DomainName: "testes"
      ElasticsearchClusterConfig:
        InstanceCount: "1"
        InstanceType: "m3.medium.elasticsearch"
      AccessPolicies:
        Statement:
          -
            Effect: "Allow"
            Principal:
              AWS: "*"
            Action: "es:*"
            Resource: "arn:aws:es:us-east-1:${AWS::AccountId}:domain/testes/*"

Labels: , , ,


October 01, 2019

 

Using embeddings for similarity search

Let’s suppose we have a large collection of questions and answers. A user can ask a question, and we want to retrieve the most similar question in our collection to help them find an answer.

* "zipping up files" should return "Compressing / Decompressing Folders & Files"
* "determine if something is an IP" should return "How do you tell whether a string is an IP or a hostname"
* "translate bytes to doubles" should return "Convert Bytes to Floating Point Numbers in Python"

Because that is the closest entry of all.

The indexed document should look like this...

{'user': '5156',
 'tags': ['xcode', 'git', 'osx', 'version-control', 'gitignore'],
 'questionId': '49478',
 'creationDate': '2008-09-08T11:07:49.953',
 'title': 'Git ignore file for Xcode projects',
 'acceptedAnswerId': '12021580',
 'type': 'question',
 'body': 'Which files should I include in .gitignore when using Git in conjunction with Xcode? ',
 'title_vector': [0.031643908470869064,
  -0.04750939458608627,
  -0.04847564920783043,
...

  0.001153663732111454,
  0.04351674020290375]}

The title_vector has exactly 512 elements in it for all records irrespective of the number of words in the title. This is because we are using google tensorflow "Universal Sentence Encoder".

https://tfhub.dev/google/universal-sentence-encoder/2

Here is the article on this topic:
https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch

And github repo:
https://github.com/jtibshirani/text-embeddings

If you want to test the application:

docker run --name text_embeddings  -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node"  -d shantanuo/textembeddings
docker exec -it text_embeddings bash
cd text-embeddings/
python3.6 src/main.py

Labels: ,


September 30, 2019

 

Create pandas dataframe using elastic beats data

Here are 5 steps to create a pandas dataframe using the packetbeat data

1) Start elastic container
2) Download and configure packetbeat config file
3) Start packetbeat container
4) Login to packetbeat container and start service
5) Import packetbeat data into pandas dataframe
_____

# start elastic
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node"   --log-driver json-file  -d elasticsearch:7.3.1

# download packetbeat config file
curl -L -O https://raw.githubusercontent.com/elastic/beats/7.0/deploy/docker/packetbeat.docker.yml

# add send_response parameter to mysql section of config file and change host address
vi packetbeat.docker.yml

packetbeat.protocols.mysql:
  ports: [3306]
  send_response: true

output.elasticsearch:
  hosts: 'some_site.com:9200'

# start a packetbeat container

docker run \
  --name pbeat \
  --disable-content-trust \
  --log-driver json-file \
  --user=packetbeat \
  --volume="$(pwd)/packetbeat.docker.yml:/usr/share/packetbeat/packetbeat.yml:ro" \
  --cap-add="NET_RAW" \
  --cap-add="NET_ADMIN" \
  --network=host \
  -d docker.elastic.co/beats/packetbeat:7.0.0

# login to packetbeat container and start packetbeat service
docker exec -it pbeat bash
cd /usr/share/packetbeat/
./packetbeat -e

# download the packetbeat data to pandas dataframe:

import pandas as pd
import numpy as np

import elasticsearch
from elasticsearch import helpers

es_client = elasticsearch.Elasticsearch("http://some_site.com:9200")

mylist = list()

for r in helpers.scan(es_client, index="packetbeat-7.0.0-2019.10.01-000001"):
    try:
        mylist.append((r["_source"]["client"]))
    except:
        mylist.append(({"ip": np.nan, "port": np.nan, "bytes": np.nan}))

df = pd.DataFrame(mylist)

Labels: ,


April 30, 2019

 

elastic watcher alerts

Here is the code to set email alerts when an event occur.

# install watcher plugin

bin/plugin install elasticsearch/license/latest
bin/plugin install elasticsearch/watcher/latest

# add email section to elasticsearch.yml file

watcher.actions.email.service.account:
    gmail_account:
        profile: gmail
        smtp:
            auth: true
            starttls.enable: true
            host: smtp.gmail.com
            port: 587
            user: shantanu.XXX
            password: XXX

# re-start elasticsearch and add a watcher document

curl -XPUT 'http://1.2.3.4:9200/_watcher/watch/log_event_watch' -d '{
  "metadata" : {
    "color" : "red"
  },
  "trigger" : {
    "schedule" : {
      "interval" : "5m"
    }
  },
  "input" : {
    "search" : {
      "request" : {
        "indices" : "packet*",
        "body" : {
          "size" : 0,
          "query" : { "match" : { "status" : "error" } }
        }
      }
    }
  },
  "condition" : {
    "script" : "return ctx.payload.hits.total > 5"
  },
  "actions" : {
    "email_administrator" : {
      "throttle_period": "15m",
      "email" : {
        "to" : "shantanu1234@gmail.com",
        "subject" : "Encountered {{ctx.payload.hits.total}} errors",
        "body" : "Too many error in the system, see attached data",
        "attachments" : {
          "attached_data" : {
            "data" : {
              "format" : "json"
            }
          }
        },
        "priority" : "high"
      }

    }
  }
}'

Labels: ,


April 28, 2019

 

MySQL error log using Elastic stack

Here are 3 easy steps to enable mysql failed query log.
1) Download packetbeat config file
2) Edit config file to add "send_response" parameter
3) Start docker container

# download packetbeat config file
curl -L -O https://raw.githubusercontent.com/elastic/beats/7.0/deploy/docker/packetbeat.docker.yml

# add send_response parameter to mysql
packetbeat.protocols.mysql:
  ports: [3306]
  send_response: true

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false
setup.ilm.enabled: false
setup.pack.security.enabled: false
setup.xpack.graph.enabled: false
setup.xpack.watcher.enabled: false
setup.xpack.monitoring.enabled: false
setup.xpack.reporting.enabled: false

# start docker container
docker run \
  --user=packetbeat \
  --volume="$(pwd)/packetbeat.docker.yml:/usr/share/packetbeat/packetbeat.yml:ro" \
  --cap-add="NET_RAW" \
  --cap-add="NET_ADMIN" \
  --network=host \
  -d docker.elastic.co/beats/packetbeat:7.0.0 \
  --strict.perms=false -e \
  -E cloud.id=XXX \
  -E cloud.auth=elastic:XXX

# Once you get logs in Kibana, use a filter type:mysql and status:Error to extract failing queries.

Labels: , ,


April 28, 2018

 

How does elasticsearch work?

Let's assume we have 3 documents to be indexed in elastic.

d1: "This is the desert. There are no people in the desert. The Earth is large."

d2: "'Where are the people?' resumed the little prince at last. 'It's a little lonely in the desert…' ,' It is lonely when you're among people, too,' said the snake."

d3: " 'What makes the desert beautiful,' said the little prince, 'is that somewhere it hides a well' "
_____

Variables used:

λ : 0.1

1-λ : 0.9

tf("desert"): 4, total number of occurence of "desert" in the collection across all documents

Lc : 59, total number of tokens in collection

Mc("desert") = (4 + 1) / (59 + 1) = 5/60

Md("desert") = tf = 2/15

idf("desert") = log(total number of documents / number of documents in which keyword found) i.e. log(3/3) = log(1) = 0
_____

1) Classic:
Simplest form of similarity search
tf * idf
(2/15) * 0 = 0


2) BM25 similarity: 
default of elasticsearch for TF/IDF based similarity that has built-in tf normalization

IDF * ((k + 1) * tf) / (k * (1.0 - b + b * (|d|/avgDl)) + tf)


3) Jelinek Mercer smoothing:
The actual formulae is:
log(1+ (1-λ) * Md / λ * Mc)

The values replaced:
(1 + (1-λ) * Md("desert") / λ * Mc("desert"))

translates to:
(1+ ((0.9) * (2/15)) / (0.1 * (5/60)))

returns:
15.4

and log is:
math.log(15.4)
2.7343675094195836

By increasing λ (lambda), we are increasing the importance of the collection model, and diminishing the importance of document model.
This is a good choice for longer queries.
_____

And this is how to test it:

https://gist.github.com/shantanuo/e203fb336ff0712f502c73a43cd85d75

Labels: ,


February 25, 2018

 

Install and configure packetbeat to monitor mysql traffic

1) Install packetbeat
deb:
sudo apt-get install libpcap0.8
curl -L -O https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-6.2.2-amd64.deb
sudo dpkg -i packetbeat-6.2.2-amd64.deb

rpm:
sudo yum install libpcap
curl -L -O https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-6.2.2-x86_64.rpm
sudo rpm -vi packetbeat-6.2.2-x86_64.rpm

2) Make sure that "query" property in "mysql" section is "text" and not "keyword".

[root@localhost packetbeat]# vi packetbeat.template-es6x.json

        "mysql": {
          "properties": {
            "affected_rows": {
              "type": "long"
            },
             "query": {
              "type": "text"
            }
          }
        },
        "nfs": {
          "properties": {
            "minor_version": {


3) Change the host, protocol and password in elasticsearch output secion of config file. Enable template overwriting and make sure version 6x will be loaded.

[root@localhost packetbeat]# vi packetbeat.yml

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["https://944fe807b7525eaf163f502e08a412c5.us-east-1.aws.found.io:9243"]
  # Optional protocol and basic auth credentials.
  protocol: "https"
  username: "elastic"
  password: "xxx"

 # Set to false to disable template loading.
  template.enabled: true

  # Template name. By default the template name is packetbeat.
  template.name: "packetbeat"

  # Path to template file
  template.path: "${path.config}/packetbeat.template.json"

  # Overwrite existing template
  template.overwrite: true

  # If set to true, packetbeat checks the Elasticsearch version at connect time, and if it
  # is 2.x, it loads the file specified by the template.versions.2x.path setting. The
  # default is true.
  template.versions.2x.enabled: false

  # If set to true, packetbeat checks the Elasticsearch version at connect time, and if it
  # is 6.x, it loads the file specified by the template.versions.6x.path setting. The
  # default is true.
  template.versions.6x.enabled: true

  # Path to the Elasticsearch 6.x version of the template file.
  template.versions.6x.path: "${path.config}/packetbeat.template-es6x.json"


4) Check the logs that everything is being loaded correctly.

[root@localhost packetbeat]# cat /var/log/packetbeat/packetbeat| more
2018-02-25T11:53:30+05:30 INFO Metrics logging every 30s
2018-02-25T11:53:30+05:30 INFO Loading template enabled for Elasticsearch 6.x. Reading template file: /etc/packetbeat/packetbeat.template-es6x.json
2018-02-25T11:53:30+05:30 INFO Elasticsearch url: https://944fe807b7525eaf163f502e08a412c.us-east-1.aws.found.io:9243
2018-02-25T11:53:30+05:30 INFO Activated elasticsearch as output plugin.
2018-02-25T11:53:30+05:30 INFO Publisher name: localhost.localdomain
2018-02-25T11:53:30+05:30 INFO Flush Interval set to: 1s
2018-02-25T11:53:30+05:30 INFO Max Bulk Size set to: 50
2018-02-25T11:53:30+05:30 INFO Process matching disabled
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: amqp
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: mongodb
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: mysql
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: nfs
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: pgsql
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: thrift
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: cassandra
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: dns
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: http
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: memcache
2018-02-25T11:53:30+05:30 INFO registered protocol plugin: redis
2018-02-25T11:53:30+05:30 INFO packetbeat start running.
2018-02-25T11:53:32+05:30 INFO Connected to Elasticsearch version 6.2.2
2018-02-25T11:53:32+05:30 INFO Trying to load template for client: https://944fe807b7525eaf163f502e08a412c.us-east-1.aws.found.io:9243
2018-02-25T11:53:32+05:30 INFO Existing template will be overwritten, as overwrite is enabled.
2018-02-25T11:53:32+05:30 INFO Detected Elasticsearch 6.x. Automatically selecting the 6.x version of the template
2018-02-25T11:53:33+05:30 INFO Elasticsearch template with name 'packetbeat' loaded

_____

Or use docker image:

[root@localhost ~]# docker run --cap-add=NET_ADMIN --network=host -e HOST="https://944fe807b7525eaf163f502e08a412c5.us-east-1.aws.found.io:9243" -e PASS="rzmYYJUdHVaglRejr8XqjIX7" shantanuo/packetbeat-agent

_____

# curl commands to connect to secure elastic (cloud)
curl --user "elastic:passwd"  https://xxx.us-east-1.aws.found.io:9243/_aliases 

curl --user "elastic:passwd"  https://xxx.us-east-1.aws.found.io:9243/_cat/indices/ 

curl --user "elastic:passwd"  https://xxx.us-east-1.aws.found.io:9243/packetbeat-6.6.2-2019.03.26/_search?pretty=true&q=*:*

Labels: , ,


February 11, 2018

 

Backup elastic data to S3

Here are the 5 steps to take backup of elastic index

1) Install s3 plugin
2) Set access and secret key
3) create repo
4) Take backup
5) Test if backup was successful

cd /home/elasticsearch/elasticsearch/bin/

# sh elasticsearch-plugin install repository-s3

# sh elasticsearch-keystore create  s3.client.default.access_key
Created elasticsearch keystore in /home/elasticsearch/elasticsearch/config

# sh elasticsearch-keystore add  s3.client.default.access_key
Enter value for s3.client.default.access_key:

# sh elasticsearch-keystore add s3.client.default.secret_key
Enter value for s3.client.default.secret_key:

# curl -XPUT "http://localhost:9200/_snapshot/my_s3_repository1" -H'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "todel162"
  }
}'

# curl -XPUT "http://localhost:9200/_snapshot/my_s3_repository1/snap2" -H'Content-Type: application/json' -d'
{
   "indices": "products, index_1, index_2", 
   "ignore_unavailable": true,
   "include_global_state": false
}'

{"accepted":true}

#  curl http://localhost:9200/_cat/snapshots/my_s3_repository?v

id     status start_epoch start_time end_epoch  end_time duration indices successful_shards failed_shards total_shards
snap1 SUCCESS 1518343272  10:01:12   1518343272 10:01:12     52ms       0                 0             0            0
snap2 SUCCESS 1518343818  10:10:18   1518343818 10:10:18     64ms       0                 0             0            0

Labels: , ,


June 29, 2017

 

Amazon User Activity Log

Here is a good article about how to integrate cloud trail to elastic search.

https://blog.powerupcloud.com/visualizing-aws-cloudtrail-logs-using-elk-stack-e3d3b399af43

Once I follow all these steps, I can log all Amazon activities into Elastic. This is super useful to know what changes are taking place in my account.

Labels: , , ,


June 04, 2017

 

Packetbeat dashboards

In order to import sample kibana dashboards, we need to install packetbeat first. And then install the sample dashboards to elastic server.

curl -L -O https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-5.4.1-x86_64.rpm

sudo rpm -vi packetbeat-5.4.1-x86_64.rpm

cd /usr/share/packetbeat/

./scripts/import_dashboards -es http://localhost:9200

Labels: ,


June 03, 2017

 

Frequently used docker containers

Here are 3 containers those I need most of the times.

1) elastic and kibana

a) elastic, kibana and packetbeat

docker run --disable-content-trust -p 9200:9200 -p 5601:5601 -d nshou/elasticsearch-kibana

docker run --cap-add=NET_ADMIN --net=host -e KIBANA="http://shantanuoak.com:5601" -e HOST="http://shantanuoak.com:9200" shantanuo/packetbeat-agent-unsecure

b) Connect to elastic hub:

docker run --cap-add=NET_ADMIN --network=host -e KIBANA="https://6a16d771c4fc3be7f251c7c629a421e2.us-east-1.aws.found.io:9243" -e HOST="https://d322f42d01dc50c50dba0b446e6a1c0a.us-east-1.aws.found.io:9243" -e PASS="pwkbZXIB3VMPtr4wOnpLNi8c"  shantanuo/packetbeat-agent

c) get the IP of elastic using command hostname -i and then install metric-beat dashboard using docker

docker run docker.elastic.co/beats/metricbeat:5.5.0 ./scripts/import_dashboards  -es http://172.31.73.228:9200


2) python pandas using miniconda

docker run -i -t -p 8888:8888 -v /tmp:/tmp continuumio/miniconda3 /bin/bash -c "/opt/conda/bin/conda install jupyter -y && cd /tmp/ && /opt/conda/bin/jupyter notebook --NotebookApp.token='india' --notebook-dir=/tmp --ip='0.0.0.0' --port=8888 --no-browser --allow-root"

3) mysql fixed bug and added IST timezone

docker run -p 3399:3306 -e MYSQL_ROOT_PASSWORD=india3399 -v /my/custom3399:/etc/mysql/conf.d  -v /storage/mysql/datadir3399:/var/lib/mysql -d shantanuo/mysql:5.7

This container uses the config file as shown below:

# vi /my/custom3399/my.cnf
[mysqld]
server-id=1723169137

max_binlog_size=1024M
expire_logs_days=40
binlog_format=ROW
binlog_checksum=NONE

### enable master
# log-bin=/var/log/mysql/mysql-bin.log

### myisam only
# skip-innodb
# default-storage-engine=MyISAM
# default_tmp_storage_engine=MyISAM
# key-buffer-size=1G
# myisam_max_sort_file_size=40G
# myisam_sort_buffer_size=512M
# bulk_insert_buffer_size=1G
### disable strict sql mode
# sql-mode=''
# secure-file-priv = ""

### innodb setting
# innodb_buffer_pool_size=1G
# innodb_log_file_size=512M

# innodb_flush_method=O_DIRECT
# innodb_file_per_table
# innodb-flush-log-at-trx-commit = 2

# make sure temp directory has sufficient space
# tmpdir=/

4) Adminer container to manage mysql

docker run -p 80:80  -d  shantanuo/adminer /bin/bash -c "/usr/sbin/apache2ctl -D FOREGROUND "



Labels: , ,


November 01, 2016

 

sysdig for system admins

What about a tool for sys admin that has all the utilties those we use everyday?
sysdig is a combination of strace + tcpdump + htop + iftop + lsof + transaction tracing

It is an open source system-level exploration tool that captures system state and activity.

Here is how to install it...

curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash

And here are a few examples

Dump system activity to file,
sysdig -w trace.scap

Show all the interactive commands executed inside a given container.
sysdig -pc -c spy_users container.name=wordpress1

View the top network connections for a single container.
sysdig -pc -c topconns container.name=wordpress1

See all the GET HTTP requests made by the machine
sudo sysdig -s 2000 -A -c echo_fds fd.port=80 and evt.buffer contains GET

See all the SQL select queries made by the machine
sudo sysdig -s 2000 -A -c echo_fds evt.buffer contains SELECT

See queries made via apache to an external MySQL server happening in real time
sysdig -s 2000 -A -c echo_fds fd.sip=192.168.30.5 and proc.name=apache2 and evt.buffer contains SELECT

More examples can be found here..

http://www.sysdig.org/wiki/sysdig-examples/#application

Labels: , ,


Archives

June 2001   July 2001   January 2003   May 2003   September 2003   October 2003   December 2003   January 2004   February 2004   March 2004   April 2004   May 2004   June 2004   July 2004   August 2004   September 2004   October 2004   November 2004   December 2004   January 2005   February 2005   March 2005   April 2005   May 2005   June 2005   July 2005   August 2005   September 2005   October 2005   November 2005   December 2005   January 2006   February 2006   March 2006   April 2006   May 2006   June 2006   July 2006   August 2006   September 2006   October 2006   November 2006   December 2006   January 2007   February 2007   March 2007   April 2007   June 2007   July 2007   August 2007   September 2007   October 2007   November 2007   December 2007   January 2008   February 2008   March 2008   April 2008   July 2008   August 2008   September 2008   October 2008   November 2008   December 2008   January 2009   February 2009   March 2009   April 2009   May 2009   June 2009   July 2009   August 2009   September 2009   October 2009   November 2009   December 2009   January 2010   February 2010   March 2010   April 2010   May 2010   June 2010   July 2010   August 2010   September 2010   October 2010   November 2010   December 2010   January 2011   February 2011   March 2011   April 2011   May 2011   June 2011   July 2011   August 2011   September 2011   October 2011   November 2011   December 2011   January 2012   February 2012   March 2012   April 2012   May 2012   June 2012   July 2012   August 2012   October 2012   November 2012   December 2012   January 2013   February 2013   March 2013   April 2013   May 2013   June 2013   July 2013   September 2013   October 2013   January 2014   March 2014   April 2014   May 2014   July 2014   August 2014   September 2014   October 2014   November 2014   December 2014   January 2015   February 2015   March 2015   April 2015   May 2015   June 2015   July 2015   August 2015   September 2015   January 2016   February 2016   March 2016   April 2016   May 2016   June 2016   July 2016   August 2016   September 2016   October 2016   November 2016   December 2016   January 2017   February 2017   April 2017   May 2017   June 2017   July 2017   August 2017   September 2017   October 2017   November 2017   December 2017   February 2018   March 2018   April 2018   May 2018   June 2018   July 2018   August 2018   September 2018   October 2018   November 2018   December 2018   January 2019   February 2019   March 2019   April 2019   May 2019   July 2019   August 2019   September 2019   October 2019   November 2019   December 2019   January 2020   February 2020   March 2020   April 2020   May 2020   July 2020   August 2020   September 2020   October 2020   December 2020   January 2021   April 2021   May 2021   July 2021   September 2021   March 2022   October 2022   November 2022   March 2023   April 2023   July 2023   September 2023   October 2023   November 2023   April 2024   May 2024   June 2024   August 2024   September 2024   October 2024   November 2024   December 2024   January 2025   February 2025   April 2025   June 2025   July 2025   August 2025  

This page is powered by Blogger. Isn't yours?