Shantanu's Blog
Database Consultant
November 17, 2010
Unix Case Study 26
Left outer join SQL like query from shell
basically i have two files:
frequency.txt: (multiple lines, space separated file containing words and a frequency)
de 1711
a 936
et 762
la 530
les 482
pour 439
le 425
...
and i have a file containing "prohibited" words:
stopwords.txt: (one single line, space separated file)
au aux avec le ces dans ...
so i want to delete from frequency.txt all the lines containing a word found on stopwords.txt
how could i do that? i'm thinking that it could be done with awk.. something like
awk 'match($0,SOMETHING_MAGICAL_HERE) == 0 {print $0}' frequency.txt > new.txt
but i'm not really sure... any ideas?? thxs in advance
http://stackoverflow.com/questions/3978626/shell-to-filter-prohibited-words-on-a-file
Labels: unix case study
March 14, 2010
Unix Case Study 25
If I want to change the sequence of columns in a comma delimited file, I can either use sed or awk.
# cat mycomma.txt
a,b,c,d,e
12,32,43,54,54
as,ewr,tre,yy,dfg
But there is another way do this...
#!/bin/bash
input=$1
while read -r line
do
IFS=, read -r f1 f2 f3 f4 f5 <<<"$line"
# quote fields if needed
echo $f5 $f1 $f2 $f3 $f4
done <"$input"
# sh testme.sh mycomma.txt
e a b c d
54 12 32 43 54
dfg as ewr tre yy
Labels: unix case study
March 04, 2010
UNIX case study - 24
Using shell like SQLLets say I have a csv file like this:
a,b1,12,
a,b1,42,
d,e1,12,
r,12,33,
I want to use grep to return only only the rows where the third column = 12. So it would return:
a,b1,12,
d,e1,12,
but not:
r,12,33,
Any ideas for a regular expression that will allow me to do this?
http://stackoverflow.com/questions/2373885/searching-a-csv-file-using-grep
_____
while IFS="," read a b c d
do
case "$c" in
12) echo "$a,$b,$c,$d"
esac
done <"file"
Labels: unix case study
January 05, 2010
UNIX case study - 23
How do I copy the entire directory structure (without files)?
For e.g. I want to copy the directory names below /home directory and paste it in the /test directory.
// copy directory structure
find /home/ -type d -print | sed 's;/home/;/test/;' | sed 's/^/mkdir /' | sh -x
Labels: unix case study
December 06, 2009
UNIX case study - 22
It is possible to have a two dimensional array in the shell script. I just have to have two nested for - do, done loops. It was really simple but it took me a while to figure it out.
#!/bin/sh
set -x
for i in "yahoo.com" "facebook.com" "google.com" "reddit.com" "cnet.com" "bbc.co.uk"
do
for j in "4.2.2.2" "8.8.8.8" "208.67.222.222"
do
echo $j $i `dig @$j $i | grep Query | awk -F ":" '{print $2}'`
done
done
Labels: linux tips, unix case study
August 11, 2009
More examples of Sed and Awk
// select only "MainRecord" lines
$ cat awktest.txt
MainRecord1 "115494",","FAELD","CT","
MainRecord2 "245774"," ,"","Gp"
MainRecord3 "165295","Aive","AHS","S",""
MainRecord4 "256254"," MOTOR "
$ sed '/^$/q' awktest.txt
MainRecord1 "115494",","FAELD","CT","
MainRecord2 "245774"," ,"","Gp"
MainRecord3 "165295","Aive","AHS","S",""
MainRecord4 "256254"," MOTOR "
$ sed -n '/^$/q;p' awktest.txt
MainRecord1 "115494",","FAELD","CT","
MainRecord2 "245774"," ,"","Gp"
MainRecord3 "165295","Aive","AHS","S",""
MainRecord4 "256254"," MOTOR "
$ awk 'NF && $0 !~ /"Footer/' awktest.txt
MainRecord1 "115494",","FAELD","CT","
MainRecord2 "245774"," ,"","Gp"
MainRecord3 "165295","Aive","AHS","S",""
MainRecord4 "256254"," MOTOR "
$ awk '!NF { exit } 1' awktest.txt
MainRecord1 "115494",","FAELD","CT","
MainRecord2 "245774"," ,"","Gp"
MainRecord3 "165295","Aive","AHS","S",""
MainRecord4 "256254"," MOTOR "
$ awk ' NF {print} !NF {exit}' awktest.txt
MainRecord1 "115494",","FAELD","CT","
MainRecord2 "245774"," ,"","Gp"
MainRecord3 "165295","Aive","AHS","S",""
MainRecord4 "256254"," MOTOR "
_____
// to select record in the range of 00 to 04
$ cat awktest.txt
date 18:00:00
date 18:01:02
date 18:02:00
date 19:06:00
date 18:03:00
date 18:05:00
$ awk '{print $2}' | awk -F ":" '{if ($2<=4) print $0}' < awktest.txt
date 18:00:00
date 18:01:02
date 18:02:00
date 18:03:00
_____
$ cat file1.txt
abc|0|xyz
123|129|opq
def|0|678
890|pqw|sdf
// print record where second column has value of 0
$ awk -F'|' '$2=="0"' file1.txt
abc|0|xyz
def|0|678
_____
$ cat file1.txt
abc|0|xyz
123|129|opq
def|0|678
890|pqw|sdf
// replace the character 'a' with Apostrophe
$ sed -e "s/a/'/" file1.txt
'bc|0|xyz
123|129|opq
def|0|678
890|pqw|sdf
$ tr a "'" < file1.txt
'bc|0|xyz
123|129|opq
def|0|678
890|pqw|sdf
_____
$ cat infile.txt
|A|21|B1||1.1|
|A|21|C|RAGH|1.1|
|A|21|D1||1.1|
|A|21|C|YES|1.1
// replace blank cells with "NA"
$ awk 'BEGIN { FS="|"; OFS="|" } { if ($5=="") $5="NA"; print }' infile.txt
|A|21|B1|NA|1.1|
|A|21|C|RAGH|1.1|
|A|21|D1|NA|1.1|
|A|21|C|YES|1.1
$ awk -F"|" '$5 == "" {$5="NA"; print; next} {print}' OFS="|" infile.txt
|A|21|B1|NA|1.1|
|A|21|C|RAGH|1.1|
|A|21|D1|NA|1.1|
|A|21|C|YES|1.1
$ perl -ne '{s/(?<=\|)(?=\|)/NA/g;print;}' infile.txt
|A|21|B1|NA|1.1|
|A|21|C|RAGH|1.1|
|A|21|D1|NA|1.1|
|A|21|C|YES|1.1
Labels: unix case study
UNIX case study - 21
Counting and numbering Duplicates
How do I add a counter for the duplicate vlaues?
$ cat mysort.txt
yan
tar
tar
man
ban
tan
tub
tub
tub
$ awk '{print $1,word[$1]++}' mysort.txt
yan 0
tar 0
tar 1
man 0
ban 0
tan 0
tub 0
tub 1
tub 2
Labels: unix case study
UNIX case study - 20
Secondary sort
$ cat mysort.txt
004002004545454000001
041002004545222000002
006003008751525000003
007003008751352000004
006003008751142000005
004001005745745000006
$ sort -k 1,5 mysort.txt
004001005745745000006
004002004545454000001
006003008751142000005
006003008751525000003
007003008751352000004
041002004545222000002
I want to sort the file according to position 1-5 and secondary sort by the last position of file 16-21
the result should be like this (file2) :
004002004545454000001
004001005745745000006
006003008751525000003
006003008751142000005
007003008751352000004
041002004545222000002
http://www.unix.com/shell-programming-scripting/115911-sort-text-file.html
Ans:
sort -k 1.1,1.5 -k 1.16,1.21 mysort.txt
Labels: unix case study
UNIX case study - 19
Only printing certain rows
I mainly work with altering columns with awk but now I encountered a problem with dealing with rows.
So what I want to do is only print rows that start with a specific name. For example:
## joe jack john
ty1 3 4
ty1 5 6
ty2 4 7
tym 5 6
tyz 7 9
Basically what I want to do is get rid of the row with ## and tym and tyz. So I only want to print ty1, and ty2
So the output will look like this:
ty1 3 4
ty1 5 6
ty2 4 7
http://www.unix.com/shell-programming-scripting/116087-only-printing-certain-rows.html
Labels: unix case study
UNIX case study - 18
Changing the sequence number
I have a data as follow:
1 400
2 239
3 871
4 219
5 543
6 ...
7 ...
.. ...
.. ...
99 818
100 991
I want to replace the sequence number (column 1) that start from 150. The output should like this:
150 400
151 239
153 871
154 219
155 543
...
...
Can anyone tell me AWK code for this....
http://www.unix.com/shell-programming-scripting/116062-changing-sequence-number.html
Labels: unix case study
July 16, 2009
UNIX case study - 17
convert file names to upper case using tr command in Unix
Need to convert file names to upper case using tr command in Unix.
In a folder -> /apps/dd01/misc
there are two files like:
pi-abcd.pdf
pi-efgh.pdf
The output of should be like:
pi-ABCD.pdf
pi-EFGH.pdf
I have used the command to work for a single file at a time like:
mv *.pdf "pi*-`ls -ltr *.pdf | awk '{print ($9)}' | cut -c 4-7 | tr '[a-z]' '[A-Z]'`.pdf"
How do we modify the above command so that it works out for multiple files in a folder as above.
http://www.unix.com/shell-programming-scripting/114463-convert-file-names-upper-case-using-tr-command-unix.html
Labels: unix case study
April 13, 2009
UNIX case study - 16
Forcing pipe outputPipe does not necessarily pass on the information to the next command unless the next command "listen" to the input. In the following example, we are getting the file list from the current directory.
# locate my.cnf | ls -lt | head
total 19136
-rw-r--r-- 1 root root 9557 Apr 13 14:52 1304.txt
drwxr-xr-x 8 root root 4096 Apr 13 13:19 Desktop
drwxr-xr-x 14 root root 4096 Apr 13 12:12 firefox
You need to use xargs command to forcefully hand over the output of the first command to the second command that does not behave.
# locate my.cnf | xargs ls -lt
-rw-r--r-- 1 root root 5299 Mar 6 12:35 /etc/mysql/my.cnf
-rw-r--r-- 1 root root 4321 Jan 21 10:16 /root/daily/todel/todel_shantanu/my.cnf
-rw-r--r-- 1 root root 3017 Aug 13 2008 /etc/my_cnf_bckup/my.cnf.rpmsave
Labels: linux tips, unix case study
April 09, 2009
UNIX case study - 15
Grepping lines in order
I am stuck on a simple issue but couldn't find a simple solution. If you have any ideas please help.
I have two files : -
FILE1
Tue 09/12 Lindsey
Wed 09/13 Randy
Thu 09/14 Susan
Fri 09/15 Randy
Sat 09/16 Lindsey
Sun 09/17 Susan
FILE2
Fri
09/12
Sat
09/13
I want to grep all the lines from FILE1 which contain the pattern in FILE2 and in the same order of FILE2
I tried ' grep -f FILE2 FILE1'
The output is -
Tue 09/12 Lindsey
Wed 09/13 Randy
Fri 09/15 Randy
Sat 09/16 Lindsey
I am getting all the desired lines from FILE1 but NOT in the order present in FILE2.
Required OUTPUT is : -
Fri 09/15 Randy
Tue 09/12 Lindsey
Sat 09/16 Lindsey
Wed 09/13 Randy
http://www.unix.com/shell-programming-scripting/106745-how-grep-lines-particular-order.html
Labels: unix case study
UNIX case study - 14
Replace text in another file using awk
I have a script. The required field is getting replaced with $name/$field2..
for name in $(
nawk -F'|' -v OFS='|' '$2="$name"' temp2 > temp3
done
Files
temp4
gg
re
tt
vv
qq
temp2
11|22|33|44|zz
11|22|33|44|zz
11|22|33|44|zz
11|22|33|44|zz
11|22|33|44|zz
outputf file(temp3)
11|$name|33|44|zz
11|$name|33|44|zz
11|$name|33|44|zz
11|$name|33|44|zz
11|$name|33|44|zz
required output
11|gg|33|44|zz
11|re|33|44|zz
11|tt|33|44|zz
11|vv|33|44|zz
11|qq|33|44|zz
My requirements
===================
file1
11|22|33|44|zz
11|22|33|44|zz
11|22|33|44|zz
11|22|33|44|zz
11|22|33|44|zz
file2
aa|bb|cc1|dd|55
aa|bb|cc2|dd|55
aa|bb|cc3|dd|55
aa|bb|cc4|dd|55
aa|bb|cc5|dd|55
required output
11|22|cc1|44|zz
11|22|cc2|44|zz
11|22|cc3|44|zz
11|22|cc4|44|zz
11|22|cc5|44|zz
I need to replace the 3rd field of file1 with 3rd field of file2 for all records(1st record should be replaced with 1st record only of the other file)
Both files have same number of lines as well as fields.
http://www.unix.com/shell-programming-scripting/106742-help-replace-field-one-file-field-another-file.html
Labels: unix case study
UNIX case study - 13
Extract Values from CSV
Hi,
I need to extract values from a CSV file based on some conditions as explained below:
File format details:
1. each set starts with AAA only
2. number of columns is fixed
3. number of rows per set may vary (as they are having different CCC rows)
Now, i need to extract 3rd column of AAA and 4th columns of the lines starting with CCC of that set. These values of 1 dataset to form one line.
For. e.g, i have data for 3 sets and file has the below:
AAA,1,a,b,c,d
BBB,1,j,k,l,m
CCC,1,p,q,r,s
CCC,1,w,x,y,z
AAA,2,j,k,l,m
BBB,2,a,b,c,d
CCC,2,p,q,r,s
AAA,3,w,x,y,z
BBB,3,a,b,c,d
CCC,3,p,q,r,s
CCC,3,m,n,o,p
CCC,3,i,j,k,l
then the output must be (3 lines only as 3 data sets exist)
a,q,x
j,q
w,q,n,j
Please advise.
http://www.unix.com/shell-programming-scripting/106738-extract-values-csv.html
Labels: unix case study
April 03, 2009
UNIX case study - 12
You can feed the for loop with the output of a function. Don't say nobody told me!!
generate_list ()
{
echo "one two three"
}
for word in $(generate_list)
do
echo "$word"
done
# one
# two
# three
Labels: linux tips, unix case study
February 19, 2009
UNIX case study - 11
You can write PHP code within bash script and take advantage of both worlds!
#!/bin/sh
echo This is the Bash section of the code.
amount=5
/usr/local/bin/php -q << EOF
\$myVar = "PHP";
print("This is the \$myVar section of $amount the code.\n The value of the variable amount is $amount \n");
?>
EOF
The output will look something like this...
# sh testphp.sh
This is the Bash section of the code.
This is the PHP section of 5 the code.
The value of the variable amount is 5
Labels: unix case study
February 18, 2009
UNIX case study - 10
Create CalendarYou can create a calendar of 365 days starting from a given day. The days in each month will be grouped together, separated by comma and enclosed in apostrophe.
sh /root/calendar.sh '2002-01-01'
#!/bin/sh
mysql -e"drop table if exists test.mycalendar;"
mysql -e"create table test.mycalendar (id int not null auto_increment, dateval date, primary key (id));"
for (( i = 0 ; i < 730 ; i++ ))
do
mysql -e"insert into test.mycalendar (dateval) select '$1' + interval $i day;"
done
mysql -e"select group_concat(concat('to_days(', '\'',dateval,'\')') order by dateval) as '' from test.mycalendar group by extract(year_month from dateval);" | sort
Labels: unix case study
January 04, 2009
UNIX case study - 9
Extract fields
please see the logs shown below:
12-12 08:47:37.545 DBG AGIML SERVER[1]..............................write() (
999
97735754501317BPL1229051853686229)
Now i want to write a script which will extract the fields like this
TOPUP,9773575450,1317,BPL,1
these are the fields shown in the different tags.
And one thing more the fields between the tag
............,
it is taking it a single column of log.
example less file | grep "9773575450"
will display the whole line.
http://www.unix.com/shell-programming-scripting/93722-script-extract-fields.html
Labels: unix case study
UNIX case study - 8
remove two sequential double quotes but only when the the field is NOT nullI have a tab delimited file where each of the strings have double quotes.
The problem is that I have records which are in the following format:
"TEXAS" ""HOUSTON"" "123" "" "2625-39-39"
""MAINE"" "" "456" "I" "3737-39-82"
I would have to output another tab delimited file in the following fashion with the extra double quotes around the string removed:
"TEXAS" "HOUSTON" "123" "" "2625-39-39"
"MAINE" "" "456" "I" "3737-39-82"
This problem seems to persit across multiple fields and multiple rows.
http://tinyurl.com/98jdxv
Labels: unix case study
Archives
June 2001
July 2001
January 2003
May 2003
September 2003
October 2003
December 2003
January 2004
February 2004
March 2004
April 2004
May 2004
June 2004
July 2004
August 2004
September 2004
October 2004
November 2004
December 2004
January 2005
February 2005
March 2005
April 2005
May 2005
June 2005
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
February 2006
March 2006
April 2006
May 2006
June 2006
July 2006
August 2006
September 2006
October 2006
November 2006
December 2006
January 2007
February 2007
March 2007
April 2007
June 2007
July 2007
August 2007
September 2007
October 2007
November 2007
December 2007
January 2008
February 2008
March 2008
April 2008
July 2008
August 2008
September 2008
October 2008
November 2008
December 2008
January 2009
February 2009
March 2009
April 2009
May 2009
June 2009
July 2009
August 2009
September 2009
October 2009
November 2009
December 2009
January 2010
February 2010
March 2010
April 2010
May 2010
June 2010
July 2010
August 2010
September 2010
October 2010
November 2010
December 2010
January 2011
February 2011
March 2011
April 2011
May 2011
June 2011
July 2011
August 2011
September 2011
October 2011
November 2011
December 2011
January 2012
February 2012
March 2012
April 2012
May 2012
June 2012
July 2012
August 2012
October 2012
November 2012
December 2012
January 2013
February 2013
March 2013
April 2013
May 2013
June 2013
July 2013
September 2013
October 2013
January 2014
March 2014
April 2014
May 2014
July 2014
August 2014
September 2014
October 2014
November 2014
December 2014
January 2015
February 2015
March 2015
April 2015
May 2015
June 2015
July 2015
August 2015
September 2015
January 2016
February 2016
March 2016
April 2016
May 2016
June 2016
July 2016
August 2016
September 2016
October 2016
November 2016
December 2016
January 2017
February 2017
April 2017
May 2017
June 2017
July 2017
August 2017
September 2017
October 2017
November 2017
December 2017
February 2018
March 2018
April 2018
May 2018
June 2018
July 2018
August 2018
September 2018
October 2018
November 2018
December 2018
January 2019
February 2019
March 2019
April 2019
May 2019
July 2019
August 2019
September 2019
October 2019
November 2019
December 2019
January 2020
February 2020
March 2020
April 2020
May 2020
July 2020
August 2020
September 2020
October 2020
December 2020
January 2021
April 2021
May 2021
July 2021
September 2021
March 2022
October 2022
November 2022
March 2023
April 2023
July 2023
September 2023
October 2023
November 2023
April 2024
May 2024
June 2024
August 2024
September 2024
October 2024
November 2024
December 2024
January 2025
February 2025
April 2025
June 2025
July 2025
August 2025
