Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.8k views
in Technique[技术] by (71.8m points)

mysql - Efficient data model for range queries

I'm struggling to think of an efficient model to describe IPv4 address data. I want to be able to perform a 'whois' type lookup on a dataset within MySQL. Currently I have this:

CREATE TABLE inetnum (
 `from_ip` int(11) unsigned NOT NULL,
 `to_ip` int(11) unsigned NOT NULL,
 `netname` varchar(40) default NULL,
 `ip_txt` varchar(60) default NULL,
 `descr` varchar(60) default NULL,
 `country` varchar(2) default NULL,
 `recurse_limit` int(11) NOT NULL default '0',
 `unexpected` int(11) NOT NULL default '0',
 `rir` enum('APNIC','AFRINIC','ARIN','RIPE','LACNIC') NOT NULL default 'RIPE',
 PRIMARY KEY  (`from_ip`,`to_ip`)
) ENGINE=MyISAM DEFAULT CHARSET=ascii;

And I want to do queries like this:

SELECT *
FROM inetnum
WHERE INET_ATON('192.168.0.1') BETWEEN from_ip AND to_ip;

But because the upper and lower bounds of the address range are held in different fields, this results in a full table scan:

mysql> EXPLAIN SELECT * FROM `inetnum` WHERE INET_ATON('192.168.0.1') BETWEEN from_ip AND to_ip;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+---------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | inetnum | ALL  | NULL          | NULL | NULL    | NULL | 3800440 | Using where |
 +----+-------------+---------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

(and as I'm sure someone will try to point out - not it's not because of the INET_ATON function - using a literal integer makes no difference, nor does using <=to_ip AND >=from_ip).

This is currently running on MySQL 5.0.67. I only have limited scope for changing/upgrading the DBMS.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Actually, your primary key have little sense in terms of such range query. It only indicates unique pairs for <from_ip, to_ip> tuple - thus, MySQL will not be able to use that index with such range comparisons.

Unless you're running some query that involves both parts of your primary key, it will have no effect (well, actually, MySQL also will use it - when selection condition uses left-subset of compound index, but that's not your case). For example, this will use primary key:

-- @x and @y are derived from somewhere else
SELECT * FROM inetnum WHERE from_ip=@x && to_ip=@y

In your case, compound key may be primary key, yes, but it's only benefit will be - to provide uniqueness. So, you can leave it as it is, or create surrogate id primary key (replacing current primary key with UNIQUE constraint).

One of possible solutions to improve situation could be - create single-column keys for from_ip and to_ip. Since they are integers, there's a good chance for high cardinality, that result indexes will have. However, MySQL can use only one index, and, therefore, you'll lose 'half' of range efficient comparison. Also you should remember, that if greater-than (or less-than) comparison will affect too many rows, MySQL will not use index as well (since, obviously, there's no sense in that because there are too much rows to select).

And - yes, avoid using functions in WHERE clause. I'm not saying that MySQL will always loose index usage in such case (but most likely, it will loose it in most cases) - but think about overhead that will cause function call. Even if it's little - you can always get rid of it via passing correct value, formed by your application.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...