# Shantanu's Blog

Database Consultant

## Chrome Extensions

some of the important Google Chrome extensions:

1) HTML5 Outliner: Generates a navigable page outline with heading and sectioning elements

3) Open Multiple URLs: Extract and open a list of URLs

4) Former2 Helper: Helps avoid CORS issues with former2.com while calling the AWS service API endpoints

5) Chrome Remote Desktop

6) Open in Colab: Open a Github-hosted notebook in Google Colab

7) Block image: Prevent images from "DownLoading". You can toggle blocking on/off by clicking the extension icon on the chrome toolbar.

8) Copy URLs: copy the urls of all tabs to clipboard

Labels:

## Pandas case study 32

Handling Outliers

Outliers can be removed or adjusted using statistical methods of IQR, Z-Score and Data Smoothing.

1) For calculating IQR (Inter Quartile Range) of a dataset, first calculate itâ€™s 1st Quartile(Q1) and 3rd Quartile(Q3) i.e. 25th and 75 percentile of the data and then subtract Q1 from Q3

import pandas as pd
data = [-2,8,13,19,34,49,50,53,59,64,87,89,1456]
df = pd.DataFrame(data)
df.columns = ['values']
ndf=df.describe().T
ndf['75%'] - ndf['25%']
# returns 45

For finding out the Outlier using IQR we have to define a multiplier which is 1.5 ideally that will decide how far below Q1 and above Q3 will be considered as an Outlier.

higher_limit = ndf['75%'] + 1.5 * 45
lower_limit = ndf['25%'] - 1.5 * 45
df[(df['values'] > higher_limit[0]) | (df['values'] < lower_limit[0])]

2) Z-Score tells how far a point is from the mean of dataset in terms of standard deviation. An absolute value of z score which is above 3 is considered as an outlier.

from scipy import stats
df['z_score']=stats.zscore(df['values'])
df[df['z_score'] > 3]

# returns values z_score
12 1456 3.454979

3) Data smoothing is a process to adjust the spikes and peaks. If your current value if 13 and previous value is 8 and smoothing level is 0.6 then the smoothed value is 11 given by
13*0.6 + (1-0.6)*8

Pandas smoothing function (EWM) can be used to calculate the exponential weighted Moving Average at different alpha levels.

df['ewm_alpha_1']=df['values'].ewm(alpha=0.1).mean()
df['ewm_alpha_3']=df['values'].ewm(alpha=0.3).mean()
df['ewm_alpha_6']=df['values'].ewm(alpha=0.6).mean()
df

https://kanoki.org/2020/04/23/how-to-remove-outliers-in-python/

Labels: ,