The following template creates one AWS Systems Manager parameter resource and consumes the TTL stack as a nested resource, passing its name as a reference and the TTL as 5 minutes. When the TTL has elapsed, A lambda function called "DeleteCFNLambda" will delete the main stack as well as clean up the nested stack.
AWSTemplateFormatVersion: '2010-09-09'
Description: Demo stack, creates one SSM parameter and gets deleted after 5 minutes.
Resources:
DemoParameter:
Type: "AWS::SSM::Parameter"
Properties:
Type: "String"
Value: "date"
Description: "SSM Parameter for running date command."
AllowedPattern: "^[a-zA-Z]{1,10}$"
DeleteAfterTTLStack:
Type: "AWS::CloudFormation::Stack"
Properties:
TemplateURL: 'https://datameetgeobkup.s3.amazonaws.com/cftemplates/cfn-stack-ttl.yaml'
Parameters:
StackName: !Ref 'AWS::StackName'
TTL: '5'
The External template that will create a lambda function is available here...
https://github.com/aws-quickstart/quickstart-examples/blob/master/samples/cloudformation-stack-ttl/templates/cloudformation-stack-ttl.yaml
Source: https://aws.amazon.com/blogs/infrastructure-and-automation/scheduling-automatic-deletion-of-aws-cloudformation-stacks/
Labels: aws, aws_cloudformation
streamlit is an ultimate visualization tool with built-in pandas and altair support.
It is very easy to get started...
# start the standard python + pandas container:
docker run -p 8501:8501 -d shantanuo/notebook
# login to container:
docker exec -it 12345 bash
# install within the container:
pip install streamlit
# start the app:
streamlit run https://raw.githubusercontent.com/streamlit/demo-uber-nyc-pickups/master/app.py
# Visit the URL with port 8501 that we have already exposed using --net parameter, for e.g.:
http://3.228.39.49:8501/
Labels: pandas, python, usability
If I have parent-child relations defined in a pandas dataframe, how do I find the child elements of a given "boss"?
For e.g. in this example, B reports to A while C and D are reporting to B. How do I find B's assistants?
import pandas as pd
data = [["A", "B"], ["B", "C"], ["B", "D"], ["X", "Y"], ["X", "D"], ["Y", "Z"]]
df = pd.DataFrame(data=data, columns=["parent", "child"])
# This is a networkx problem!
import networkx as nx
dg = nx.from_pandas_edgelist(df, source="parent", target="child", create_using=nx.DiGraph)
nx.descendants(dg, "B")
{'C', 'D'}
Labels: pandas, python
This cloudformation template will create an elastic instance that can be used for testing.
It will not have a password and will be accessible from any IP address. It means unsecure but good enough for testing with dummy data.
Resources:
ElasticsearchDomain:
Type: AWS::Elasticsearch::Domain
Properties:
DomainName: "testes"
ElasticsearchClusterConfig:
InstanceCount: "1"
InstanceType: "m3.medium.elasticsearch"
AccessPolicies:
Statement:
-
Effect: "Allow"
Principal:
AWS: "*"
Action: "es:*"
Resource: "arn:aws:es:us-east-1:${AWS::AccountId}:domain/testes/*"
Labels: aws, aws_cloudformation, elastic, usability
The ultimate goal of solving a system of linear equations is to find the values of the unknown variables. Here is an example of a system of linear equations with two unknown variables, x (Child) and y (adults):
Bus: 3 * x + 3.2 * y = 118.4
Train: 3.5 * x + 3.6 * y = 135.20
A group took a trip on a bus, at $3 per child and $3.20 per adult for a total of $118.40. They took the train back at $3.50 per child and $3.60 per adult for a total of $135.20.
How many children, and how many adults were in the group?
https://www.mathsisfun.com/algebra/matrix-inverse.html
numpy implementation has inv (inverse) and dot methods:
A = np.array([[3, 3.2], [3.5, 3.6]])
B = np.array([118.4, 135.2])
np.linalg.inv(A).dot(B)
# numpy also has solve method that is easier to use:
np.linalg.solve(A,B)
Labels: machine_learning, python
Here is the code that will load the popular mnist digits data and apply Support Vector Classifier. The overall accuracy is 97% for this multi-class classification problem and that is not bad for 10 lines of code!
from sklearn import datasets, svm, metrics
digits = datasets.load_digits()
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
classifier = svm.SVC(gamma=0.001)
classifier.fit(data[: n_samples // 2], digits.target[: n_samples // 2])
predicted = classifier.predict(data[n_samples // 2 :])
expected = digits.target[n_samples // 2 :]
print(metrics.classification_report(expected, predicted))
print (metrics.confusion_matrix(expected, predicted))
# Optionally, we can save the model using pickle.
with open('mymodel.pkl', 'wb') as file:
pickle.dump(classifier, file, protocol=pickle.HIGHEST_PROTOCOL)
Labels: machine_learning, python
I was trying to launch an Amazon Redshift cluster in a specific VPC, but I could not select any VPC in the drop-down list. I checked that a VPC exist (though it's not default one!)
In order to resolve this, follow these 3 steps to create a cluster subnet group.
1) click "Security" from redshift console.
2) On the Subnet Groups tab, click Create Cluster Subnet Group.
3) Specify a Name, Description, VPC ID and Click add "all the subnets" link.
Labels: athena, aws, redshift
Here is the copy command to load data from parquet file format to existing table in redshift.
copy favoritemovies from 'S3://some_bucket/parquet_file/'
iam_role 'arn:aws:iam::0123456789012:role/RedshiftCopyUnload'
The IAM role needs to be created using the steps mentioned below:
To create an IAM role to allow Amazon Redshift to access AWS services
1) Open the IAM Console.
2) In the navigation pane, choose Roles.
3) Choose Create role.
4) Choose AWS service, and then choose Redshift.
5) Under Select your use case, choose Redshift - Customizable and then choose Next: Permissions.
6) The Attach permissions policy page appears. choose
AmazonS3ReadOnlyAccess, AWSGlueConsoleFullAccess and AmazonAthenaFullAccess.
7) Choose Next: Tags.
8) Choose Next: Review.
9) For Role name, type a name for your role, for example RedshiftCopyUnload. Choose Create role.
Now you need to attach the role to the current cluster.
1) Sign in to the AWS Management Console and open the Amazon Redshift console
2) In the navigation pane, choose Clusters.
3) In the list, choose the cluster that you want to manage IAM role associations for. Choose Manage IAM Roles.
4) Select your IAM role from the Available roles list to associate or remove current one.
5) choose Apply Changes to update the IAM roles that are associated with the cluster.
1) https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html
2) https://docs.aws.amazon.com/redshift/latest/mgmt/copy-unload-iam-role.html#copy-unload-iam-role-associating-with-clusters
Labels: athena, aws, redshift
Let’s suppose we have a large collection of questions and answers. A user can ask a question, and we want to retrieve the most similar question in our collection to help them find an answer.
* "zipping up files" should return "Compressing / Decompressing Folders & Files"
* "determine if something is an IP" should return "How do you tell whether a string is an IP or a hostname"
* "translate bytes to doubles" should return "Convert Bytes to Floating Point Numbers in Python"
Because that is the closest entry of all.
The indexed document should look like this...
{'user': '5156',
'tags': ['xcode', 'git', 'osx', 'version-control', 'gitignore'],
'questionId': '49478',
'creationDate': '2008-09-08T11:07:49.953',
'title': 'Git ignore file for Xcode projects',
'acceptedAnswerId': '12021580',
'type': 'question',
'body': 'Which files should I include in .gitignore when using Git in conjunction with Xcode? ',
'title_vector': [0.031643908470869064,
-0.04750939458608627,
-0.04847564920783043,
...
0.001153663732111454,
0.04351674020290375]}
The title_vector has exactly 512 elements in it for all records irrespective of the number of words in the title. This is because we are using google tensorflow "Universal Sentence Encoder".
https://tfhub.dev/google/universal-sentence-encoder/2
Here is the article on this topic:
https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch
And github repo:
https://github.com/jtibshirani/text-embeddings
If you want to test the application:
docker run --name text_embeddings -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -d shantanuo/textembeddings
docker exec -it text_embeddings bash
cd text-embeddings/
python3.6 src/main.py
Labels: elastic, machine_learning