Here is how we can normalize pandas data. This JSON file has "channels" column with the items as a list. We need to create a new model with values as column names with true/ false indicators.
import pandas as pd
df=pd.read_json('https://raw.githubusercontent.com/nugroho1234/starbucks-project/master/data/portfolio.json',
orient='records', lines=True)
df.channels
df.channels.apply(pd.Series)
df.channels.apply(pd.Series).stack()
pd.get_dummies(df.channels.apply(pd.Series).stack())
pd.get_dummies(df.channels.apply(pd.Series).stack()).sum(level=0)
final_df=pd.concat([df, pd.get_dummies(df['channels'].apply(pd.Series).stack()).sum(level=0)], axis = 1)
Labels: pandas
There is a new python module called pyjanitor to take care of dataframe column names.
!conda install --yes -c conda-forge pyjanitor
1) Strip_underscores: Remove first and last underscore
2) case_type: lower, upper, preseve
3) remove_special: Remove special characters like %$@
4) Limit Column Characters: restrict the column width
For e.g. a column name like "_first_@NAME%_" will become "first_nam".
import janitor
df=df.clean_names(strip_underscores=True, case_type='lower',
remove_special=True).limit_column_characters(9)
This is a must have tool for anyone who uses pandas dataframe.
Labels: pandas, usability
This lambda function will assign the given IP address to the instance generated by fleet spot request.
def lambda_handler(event, context):
from boto3 import client as boto3_client
ec2_client = boto3_client('ec2')
myip = '118.210.57.140'
for i in ec2_client.describe_instances()['Reservations']:
for x in i['Instances']:
if x['Tags'][0]['Value'] == 'sfr-51319954-4575-4e22-815b-42':
if x['State']['Name'] == 'running':
myr=x['InstanceId']
print (myr)
ec2_client.associate_address(InstanceId=myr, PublicIp=myip, AllowReassociation=True)
return "IP assigned"
Labels: aws, aws_lambda
Here are the commands to launch instances using "Launch Templates". We will initiate 2 spot instances using fleet option. We will modify capacity and then terminate all instances at once using these commands...
aws ec2 create-launch-template --cli-input-json file://ec2-fleet-launch-template.json
aws ec2 create-fleet --cli-input-json file://ec2-fleet-ec2-spot-turbo-boost.json
aws ec2 describe-fleets --fleet-id $FLEETID
aws ec2 describe-fleet-history --fleet-id $FLEETID --start-time 2018-01-01
aws ec2 modify-fleet --fleet-id $FLEETID --target-capacity-specification TotalTargetCapacity=20
aws ec2 delete-fleets --fleet-id $FLEETID --terminate-instances
We need to use either us-east-1 (Virginia) or ap-south-1 (for Mumbai region). The region can be set using "aws configure" command.
Here are 2 config files used:
# cat ec2-fleet-launch-template.json
{
"LaunchTemplateName": "ec2-fleet2",
"VersionDescription": "Demo Launch Template for EC2 Fleet",
"LaunchTemplateData": {
"ImageId": "ami-009d6802948d06e52",
"KeyName": "dec15a",
"TagSpecifications": [
{
"ResourceType": "instance",
"Tags": [
{
"Key": "Name",
"Value": "Demo EC2 Fleet1"
}
]
}
]
}
}
# cat ec2-fleet-ec2-spot-turbo-boost.json
{
"SpotOptions": {
"AllocationStrategy": "diversified",
"InstanceInterruptionBehavior": "terminate"
},
"ExcessCapacityTerminationPolicy": "termination",
"LaunchTemplateConfigs": [
{
"LaunchTemplateSpecification": {
"LaunchTemplateName": "ec2-fleet2",
"Version": "1"
},
"Overrides": [
{
"InstanceType": "c3.large"
}
]
}
],
"TargetCapacitySpecification": {
"TotalTargetCapacity": 2,
"OnDemandTargetCapacity": 0,
"SpotTargetCapacity": 2,
"DefaultTargetCapacityType": "spot"
},
"TerminateInstancesWithExpiration": true,
"Type": "maintain",
"ReplaceUnhealthyInstances": true,
"TagSpecifications": [
{
"ResourceType": "fleet",
"Tags": [
{
"Key": "Name",
"Value": "Demo EC2 Fleet"
}
]
}
]
}
Labels: aws
Cloudformation is an example of "Infrastructure as a code". I can easily create stateful resources. For e.g. I can save this as a template file to create a subdomain called training2 and connect it to the given IP address.
{
"AWSTemplateFormatVersion": "2010-09-09",
"Resources": {
"dnstrainingshantanuoakcom": {
"Type": "AWS::Route53::RecordSetGroup",
"Properties": {
"HostedZoneName": "shantanuoak.com.",
"RecordSets": [
{
"Name": "training2.shantanuoak.com.",
"Type": "A",
"TTL": "300",
"ResourceRecords": [
"18.210.57.140"
]
}
]
}
}
}
}
Though this can also be done using boto, the difference is that I can simply delete the stack to revert back.
Labels: aws, usability
There are times when we need to change the case of columns, remove special characters like $% and limit column characters if the column name is too lengthy.
Welcome to py-janitor.
!conda install --yes -c conda-forge pyjanitor
import janitor
df.clean_names(case_type="upper", remove_special=True).limit_column_characters(3)
Labels: pandas, python
Here is 5 steps to use Dockerized django installation as explained in this article.
https://testdriven.io/dockerizing-django-with-postgres-gunicorn-and-nginx
1) Install docker compose:
sudo curl -L https://github.com/docker/compose/releases/download/1.18.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
2) Download docker file
git clone https://github.com/testdrivenio/django-on-docker.git
3) Edit allowed host to add your site:
vi /home/ec2-user/django-on-docker/app/hello_django/settings.py
# ALLOWED_HOSTS = ['shantanuoak.com']
4) Use compose to start relevant containers
cd django-on-docker
docker-compose up -d --build
5) Visit your site:
http://shantanuoak.com:1337
Labels: docker
How do I extract 1 to 3 digit numbers? I do not need 4+ digits for e.g. 2009
>>> re.findall(r'\d{1,3}', '1 2 134 2009')
['1', '2', '134', '200', '9']
The expected result is as follows:
['1', '2', '134']
There are 2 options:
re.findall(r'\b\d{1,3}\b', '1 2 134 2009')
[n for n in numbers.split() if len(n) <= 3]
Labels: pandas
We can group by to find max and min values and then convert it to dictionary.
df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
groups = df.groupby(['Equip', 'Job']).agg({'Date': [min, max]}).reset_index()
groups.columns = ['Equip', 'Job', 'First', 'Last']
groups.to_dict(orient='records')
https://stackoverflow.com/questions/53144837/pythonic-way-of-collapsing-grouping-a-list-to-aggregating-max-min/
Labels: pandas