Shantanu's Blog

Database Consultant

June 30, 2005

MySQL Case Study - 36

please take a look a this query:

select * from predmeti p, rjesenja r WHERE p.predmetID = r.predmeti_predmetID

As you can see, here I want to display content from table predmeti and rjesenja. But only if there is a foreign key in rjesenja.

This works fine, it shows the desired results, but..

the vise versa version:

select * from predmeti p, rjesenja r WHERE p.predmetID != r.predmeti_predmetID

does not work, here I want to display all from table predmeti, but only if there is
no foreign key in table rjesenja.

What is wrong?

posted by shantanu # Thursday, June 30, 2005 1 comments

MySQL Case Study - 35

I imported a bunch of users using their email address as the username. I need to update every user who has an email address in the username field, to convert the username to just what is before the @ symbol.

Example:

user@host.com becomes user

and

user.name@host.com becomes user.name

Can anybody help with this?

posted by shantanu # Thursday, June 30, 2005 1 comments

June 29, 2005

INSERT ... ON DUPLICATE KEY UPDATE ...

The new (version 4.1) ON DUPLICATE KEY UPDATE ... clause for INSERT statements allows special handling when an inserted row would cause a duplicate value in a UNIQUE index (or a PRIMARY KEY). For example, you could record votes in a poll system using a statement like the following:

mysql> INSERT INTO poll (favorite, votes) VALUES ('Hamburger', 1)
-> ON DUPLICATE KEY UPDATE votes = votes 1;

You can use the new VALUES() function to refer to column values from the INSERT part of the query.

mysql> INSERT INTO poll (favorite, votes, last_voter)
VALUES ('Hamburger', 1, 'Wimpy')
-> ON DUPLICATE KEY UPDATE votes = votes 1,
last_voter = VALUES(last_voter);"

_____

ROLLUP provides summary rows for each GROUP BY level.

An ordinary SELECT ... GROUP BY on this table gives this result.

mysql> SELECT col1, col2, SUM(col3) AS sumcol3
-> FROM T_rollup GROUP BY col1,col2;
+------+------+---------+
| col1 | col2 | sumcol3 |
+------+------+---------+
| 1 | a | 1.10 |
| 1 | b | 2.35 |
| 2 | a | 7.77 |
+------+------+---------+
3 rows in set (0.03 sec)

In contrast, the same query, using WITH ROLLUP, gives this result.

mysql> SELECT col1, col2, SUM(col3) AS sumcol3
-> FROM T_rollup GROUP BY col1,col2 WITH ROLLUP;
+------+------+---------+
| col1 | col2 | sumcol3 |
+------+------+---------+
| 1 | a | 1.10 |
| 1 | b | 2.35 |
| 1 | NULL | 3.45 |
| 2 | a | 7.77 |
| 2 | NULL | 7.77 |
| NULL | NULL | 11.22 |
+------+------+---------+
6 rows in set (0.04)

As the result shows, in addition to the groups returned by the regular GROUP BY query, GROUP BY ... WITH ROLLUP also returns a summary row for each group.

Thus, for the group where col1 equals one (1), col2 gets a NULL to indicate a summary row.
_____

A CREATE TABLE ... LIKE statement is a quick way of cloning the structure of an existing table, including any indexes (but not foreign keys). This can be particularly useful in creating new tables to be added to an existing MERGE table.

mysql> CREATE TABLE log_20041124 LIKE log_20041123;

http://dev.mysql.com/tech-resources/articles/4.1/grab-bag.html

posted by shantanu # Wednesday, June 29, 2005 0 comments

June 28, 2005

Google launches 'Personalized Search'

Personalized Search' (beta) learns from your history of searches and search results you've clicked on, and brings certain results closer to the top when it's clear they're most relevant to you. As you build up your search history, your personalized search results will continue to improve over time.
This is exactly what I had dreamed about in March 04 when Personalized Search was introduced.
http://tinyurl.com/cbhwc

posted by shantanu # Tuesday, June 28, 2005 0 comments

June 27, 2005

MySQL Case Study - 34

I'm looking to list the number of Users who have an entry matching "Page View" in the table UserActions. The query I'm using is:

SELECT COUNT(*) AS Download_Ruequests
FROM (SELECT * FROM UserActions,
(SELECT * FROM UserActions WHERE Action LIKE "View%" GROUP BY UserID) AS Users
WHERE UserActions.Action = "Request Download"
GROUP BY UserActions.UserID) AS Downloads

However, it involves 2 subqueries and I can't help but feel there must be a simpler way...

I want to return users who have viewed a page AND requested a download, but each of these actions is in a different row in the table. So I'm effectively saying, show me the User IDs which have this action and another action in a different row.

Any pointers greatly appreciated.

posted by shantanu # Monday, June 27, 2005 1 comments

MySQL Case Study - 33

I am trying to count the number of rows returned in a result set, and can't figure out how. For example, I have this SQL:

SELECT count(*)
FROM `4images_sessions_hist` sh
WHERE sh.session_user_id <> -1
group by FROM_UNIXTIME(sh.session_lastaction, '%Y-%m-%d %H'), sh.session_user_id, sh.session_ip

It currently returns:
count(*)
1
11
1
1
2

But I would like it to return the number of rows, i.e. 5

posted by shantanu # Monday, June 27, 2005 1 comments

June 26, 2005

Optimizing MySQL: Queries and Indexes

You can provide similar information for the Optimizer by running

ANALYZE TABLE tablename;

This stores the key distribution for the table (running ANALYZE is equivalent to running myisamchk -a or myismachk --analyze).

Many deletes and updates leave gaps in the table (especially when you're using varchar, or in particular text/blob fields). This means there are more unnecessary disk I/O's, as the head needs to skip over these gaps when reading. Running

OPTIMIZE tablename

solves this problem. Both of these statements should be run fairly frequently in any well looked after system.

http://www.databasejournal.com/features/mysql/article.php/10897_1382791_3

posted by shantanu # Sunday, June 26, 2005 0 comments

MySQL Case Study - 32

I have routinely used statements like:
select p.name, t.name from team as t, player as p where p.team=t.id; for example
/*return the team name instead of id reference stored in player table*/

I recently had a case where I have players accociated with up to three teams...

select name, team1, team2, team3 from player limit 1;

looks something like: || John Doe | 1 | 2 | 3 ||

I want to return the team names for all three teams, but because the team id's for team1, team 2 and team three will never match i cannot use this

select p.name, t.name, t.name, t.name
from player as p, team as t
where p.team1=t.id and p.team2=t.id and p.team3=t.id; Odvoiusly won't work

What is the solution for this in mysql 3 < latest 3.xx?

posted by shantanu # Sunday, June 26, 2005 1 comments

June 24, 2005

MySQL Case Study - 31

I'm looking for a query to return values which occur in one table, but not in the other. Say we have two tables (table_1, table_2) each with a column titled "appointments". I want a list of results for appointments which are in table_1, but not in table_2. I have tried the following, but all I get is a huge list of duplicate appointments which occur in both tables:

SELECT table_1.appointments
FROM table_1, table_2
WHERE table_1.appointments<>table_2.appointments;

I've also tried 'NOT IN' for this but the result was just the same.

posted by shantanu # Friday, June 24, 2005 1 comments

Compound Annual growth Rate (CAGR)

You can use the IRR function to calculate the internal rate of return for a series of values. But the XIRR function is used to determine the internal rate of return when there are a number of irregular payments associated with an investment. The XIRR function is provided as part of the Analysis ToolPak add-in. If you put money into your portfolio, and take some out, then in and out etc. etc. then we're NOT talking about a buy-and-hold return. That makes the calculation of Annualized return more complicated, but there's an MS Excel command for this:
The XIRR command. It goes like this:

=XIRR(A1:A50,B1:B50,0.10)

where the range A1:A50 contains the cash flows
and the range B1:B50 contains the dates
and 0.10 is an initial guess (like 0.10 meaning 10%)
_____

The compound annual growth rate (CAGR) is computed by the formula:
[(Pv / P0) (1/n)]-1
where Pv = Present value
P0 = Beginning value
n = Number of periods
The result is multiplied by 100 to obtain a percentage.

CAGR, is the year over year growth rate applied to an investment
or other part of a activity over a multiple-year period.
The formula for calculating CAGR using a spreadsheet is
=(Current Value/Base Value)^(1/# of years) - 1
To change the number into a percentage, multiplying by 100.

current value _____ 100
base value _______ 80
number of years ___ 10
Excel formula_____ =((B1/B2)^(1/B3)-1)
Result: __________ 0.023
To change to a percent
=B4*100
Result: 2.26

posted by shantanu # Friday, June 24, 2005 1 comments

June 22, 2005

Sample tables to learn SQL

CREATE TABLE president (
last_name VARCHAR(15) NOT NULL,
first_name VARCHAR(15) NOT NULL,
suffix VARCHAR(5) NULL, /* name suffix (Jr., II, etc.) */
city VARCHAR(20) NOT NULL,
state VARCHAR(2) NOT NULL,
birth DATE NOT NULL,
death DATE NULL );

The data is available in the same directory. You can run these SQL commands to create a sample table and learn how to query data.

http://www.cs.txstate.edu/~rs01/mysql/dubois/create_president.sql

posted by shantanu # Wednesday, June 22, 2005 0 comments

June 19, 2005

NewsForge | Interview: Axmark and Behlendorf on OSS for India

OSS removes bad proprietary software. Look for example to what happened to the compiler market after gcc came around in the '80s. Some compilers could handle the competition, but most just went away. As a customer, the choices become easier when there are fewer choices.

http://business.newsforge.com/article.pl?sid=05/06/03/1355239&from=rss

posted by shantanu # Sunday, June 19, 2005 0 comments

June 12, 2005

MySQL Case Study - 30

I am trying to get "default" value for every ID of JOIN if ON clause were unsucessfull.
In my example, I am trying to get "english" rows, but if they are not available, I would like to get default - "estonian".
I have two tables ie.

table1:
id
1
2
3

table2:
id - lang - desc
1 - english - EN_A
1 - estonian - ES_A
2 - english - EN_B
2 - estonian - ES_B
3 - estonian - ES_C

now if I join these two tables ie.:
SELECT * FROM table1 LEFT JOIN table2
ON table1.id=table2.id AND table2.lang ='english'
GROUP BY table1.id;

I am getting:
--
id - id - lang - desc
1 - 1 - english - EN_A
2 - 2 - english - EN_B
--

But I would to have "default" row if english is not available:

--
id - id - lang - desc
1 - 1 - english - EN_A
2 - 2 - english - EN_B
3 - 3 - estonian - ES_C
--

I have tried with UNION of two JOINS (one for english, one for estonian) but then I am getting
duplicate rows with ID=(1,2)

posted by shantanu # Sunday, June 12, 2005 1 comments

MySQL Case Study - 29

I need to make an advanced SQL statement. list each patient's name (sorted alphabetically), and the date of their earliest interview that was after 07/01/2003, and the score of the interview on that date. If there were two interviews on that date, list the one with the higher score.

So the output should look like:

PatientName EarliestDate BestScore
-------------------- ------------------------------ -----------
Adams 07/05/2003 60
Bryson 07/03/2003 68
Caldwell 10/03/2003 55

now the schema of the table is
CREATE TABLE tmpPatient(PatientID int, PatientName varchar(20))
CREATE TABLE tmpInterview(InterviewID int, PatientOID int, InterviewDate
datetime, InterviewScore int)

the values of the respective tables is
INSERT tmpPatient (PatientID, PatientName) VALUES (1, 'Adams')
INSERT tmpPatient (PatientID, PatientName) VALUES (2, 'Bryson')
INSERT tmpPatient (PatientID, PatientName) VALUES (3, 'Caldwell')

INSERT tmpInterview (InterviewID, PatientOID, InterviewDate,
InterviewScore) VALUES (1, 1, '2/3/2003', 75)
INSERT tmpInterview (InterviewID, PatientOID, InterviewDate,
InterviewScore) VALUES (2, 1, '7/5/2003', 60)
INSERT tmpInterview (InterviewID, PatientOID, InterviewDate,
InterviewScore) VALUES (3, 1, '12/3/2003', 65)
INSERT tmpInterview (InterviewID, PatientOID, InterviewDate,
InterviewScore) VALUES (4, 2, '7/3/2003', 68)
INSERT tmpInterview (InterviewID, PatientOID, InterviewDate,
InterviewScore) VALUES (5, 2, '7/3/2003', 65)
INSERT tmpInterview (InterviewID, PatientOID, InterviewDate,
InterviewScore) VALUES (6, 2, '2/3/2003', 78)
INSERT tmpInterview (InterviewID, PatientOID, InterviewDate,
InterviewScore) VALUES (7, 3, '4/3/2003', 65)
INSERT tmpInterview (InterviewID, PatientOID, InterviewDate,
InterviewScore) VALUES (8, 3, '10/3/2003', 55)
INSERT tmpInterview (InterviewID, PatientOID, InterviewDate,
InterviewScore) VALUES (9, 3, '11/3/2003', 75)

posted by shantanu # Sunday, June 12, 2005 1 comments

MySQL Case Study - 28

I have a table that store different items. For each items, I can attach up to 5 different textual categories. Those categories are free-text, and different columns can have the same values (example below).
I am trying to select the count of each of those categories, regardless of it's position.

The table looks like:
ID - int(11) auto_increment,
Description - varchar(100),
Cat1 - varchar(30),
Cat2 - varchar(30),
Cat3 - varchar(30),
Cat4 - varchar(30),
Cat5 - varchar(30) ... (etc).

Sample data may be:
1, "aaa", "Food", "America", "Cheese", NULL, NULL
2, "bbb", "Drink", "America", "Wines", NULL, NULL
3, "ccc", "Wines", "Drink", NULL, NULL, NULL
4, "ddd", "America", "Food", NULL, NULL

The result I want is
Food - 2
America - 3
Drink - 2
Wines - 2
Cheese - 1

Hope you guys can help (BTW, I'm not too happy with the way the table is
designed, but couldn't think of a better way).

posted by shantanu # Sunday, June 12, 2005 1 comments

MySQL Case Study - 27

Here is my table.
ID | SubID | Amount1 | Amount2 |
25 1 50 100
25 2 25 75
25 3 100 10
26 1 50 60
26 2 25 75

What I want to do is find the sum of Amount1 and Amount2 from each ID where the Subid is Max()

So I basically want
the sum of ID25, SubID3, (100+10)
+
the sum of ID26, SubID2, (25+75)

The value I want returned from my select statement is 210

posted by shantanu # Sunday, June 12, 2005 1 comments

MySQL Case Study - 26

Cross tab?

I have a table of user information like name, country, sex
e.g.
name country sex
jon usa m
lisa malasia f
tom usa m
arnold india m
david india m
nita usa f

i need the result summary country wise and i.e

country total male female
malasia 1 0 1
india 2 2 0
usa 3 1 2
-------------------------------------------
total 6 3 3 --> this i can manage in the reports so don's bother about the total.

I have been able to get it only group by country , but separating sexes is a toil on me
can any one help?

posted by shantanu # Sunday, June 12, 2005 1 comments

June 06, 2005

Sum all item prices except one

Problem:
Columns A:B contains clothing items and their matching price.
How could we create a formula that will sum the prices in column B of all the items except the jacket?

Solution:
Use the SUMIF function as follows:
=SUMIF(A2:A6,""<>Jacket"",B2:B6)

Items____Prices
Shoes____100
T-Shirt__20
Jeans____50
Jacket___150
Socks____5

Result 175

posted by shantanu # Monday, June 06, 2005 0 comments

June 02, 2005

Stock prices from Yahoo!

How about an excel file that will download the current share prices from Yahoo website when you click on "Download Data" button on the sheet?
You can download the original version from here...
http://www.geocities.com/pj_p/Yahoo.ZIP

You can right click on the "Download Data" button and place it anywhere on the sheet.
And more information can be found here...
http://www.gummy-stuff.org/Yahoo-data.htm

posted by shantanu # Thursday, June 02, 2005 0 comments

Microsoft going open source

Microsoft says that its new XML formats will be royalty-free, so the documents should be easily accessible in other applications that support XML. In addition, the XML files will compress everything via Zip compression technology, which will make typical files 50 percent to 75 percent smaller than they would be in today's default Office formats (.doc for Word, .xls for Excel, and .ppt for PowerPoint).

Office 12 won't be the first version of Office to support XML; Office 2003 can save Word and Excel files in XML formats, and Office XP included an XML spreadsheet format. But adopting XML as the default is a major change that should improve interoperability with third-party apps that have struggled with today's famously nebulous and poorly documented Office binary file formats. Even though Microsoft will be using its own schemas, Capossela says that the company will make them widely accessible, which in theory should help competitors such as WordPerfect and OpenOffice handle the new formats.

On the other hand, the new formats will almost certainly cause problems for users of applications designed to work with the existing Office default formats--at least until these apps can be updated to support the XML formats.

http://www.pcworld.com/resource/article/0,aid,121104,pg,1,00.asp

posted by shantanu # Thursday, June 02, 2005 0 comments

Shantanu's Blog

June 30, 2005

MySQL Case Study - 36

MySQL Case Study - 35

June 29, 2005

INSERT ... ON DUPLICATE KEY UPDATE ...

June 28, 2005

Google launches 'Personalized Search'

June 27, 2005

MySQL Case Study - 34

MySQL Case Study - 33

June 26, 2005

Optimizing MySQL: Queries and Indexes

MySQL Case Study - 32

June 24, 2005

MySQL Case Study - 31

Compound Annual growth Rate (CAGR)

June 22, 2005

Sample tables to learn SQL

June 19, 2005

NewsForge | Interview: Axmark and Behlendorf on OSS for India

June 12, 2005

MySQL Case Study - 30

MySQL Case Study - 29

MySQL Case Study - 28

MySQL Case Study - 27

MySQL Case Study - 26

June 06, 2005

Sum all item prices except one

June 02, 2005

Stock prices from Yahoo!

Microsoft going open source

Archives