yuweijun'blog: September 2007

Saturday, September 29, 2007

[1129]MySQL::Error: Unblock with 'mysqladmin flush-hosts'

MYSQL出现此问题的原因是：
Error: Host '***.***.***.***' blocked because of many connection errors. Unblock with 'mysqladmin flush-hosts'
Errno.: 1129
Similar error report has beed dispatched to administrator before.
该IP因为有太多的错误连接已被锁定,请执行 mysqladmin flush-hosts 来解除锁定.

MySQL最大连接数根据my.cnf不同而不同，最小那个配置文件是100，my-large.cnf那个是256个连接，具体可以在mysql shell下用show variables like 'max_connections'; 查看，可以重新设置此数值。
不过一般发生上面的这个错误就不是靠设置这个数值能解决，主要还是程序本身有错误导致错误请求连接太多，导致后面正常请求也无法访问，在解决程序本身错误后还必须在主DB所在主机上执行mysqladmin -p flush-hosts后，才能重新连接上数据库。

Friday, September 28, 2007

动态增加Apache2.2的Module

$shell > cd httpd－2.2.4
$shell > ./configure --prefix=/usr/local/apache2 --enable-so --enable-mods-shared=all --enable-modules=all
这里需要自己将需要enable的module一起列在configure中，如--enable-proxy --enable-proxy-http --enable-proxy-ftp等，才会生成对应的so文件
$shell > make
$shell > find . -name "*.so"
将找到的需要的so文件，复制到apache的modules目录下，然后修改http.conf文件，Load这些需要的Modules

Thursday, September 27, 2007

[1062]MySQL::Error: Duplicate entry '0' for key 4' on query

数据同步机制发生崩溃，错误提示为 Duplicate entry '0' for key 4' on query，类似错误以前也见过，但一般是因为Unique的索引重复了才报错，这次的错误中提示的0这个Unique Id是不可能有的，主键是自增的id，不会有0出现，猜是对应的表或者表索引文件出问题了，用check table table_name检查了一下，果然有一个警告，二个报错，用rapire table table_name修复，由于表比较大，修复了很旧还是没好，最后拿上次的备份回来重新同步了半天才恢复正常。

Wednesday, September 26, 2007

sniffer帮助理解子网掩码、网关与ARP协议的作用

通过简单的实验深入透析子网掩码，网关与ARP协议的作用
子网掩码，网关与ARP协议的概念和工作原理是学习网络知识的初学者首先遇到的几个重要的知识点，其中子网掩码与ARP协议的作用和基本工作原理更是思科网络技术学院教程Semester 1中的重点与难点，初学者往往难以一下子掌握这些抽象复杂的机理。因此很有必要通过实验来帮助学员更加深入直观地了解子网掩码，网关与ARP协议的基本概念与工作原理。
在对实验进行讲解之前，首先对子网掩码，网关与ARP协议的基本知识进行概述。
子网掩码(Subnet Mask)
子网掩码的主要功能是告知网络设备，一个特定的IP地址的哪一部分是包含网络地址与子网地址，哪一部分是主机地址。网络的路由设备只要识别出目的地址的网络号与子网号即可作出路由寻址决策，IP地址的主机部分不参与路由器的路由寻址操作，只用于在网段中唯一标识一个网络设备的接口。本来，如果网络系统中只使用A、B、C这三种主类地址，而不对这三种主类地址作子网划分或者进行主类地址的汇总，则网络设备根据IP地址的第一个字节的数值范围即可判断它属于A、 B、C中的哪一个主类网，进而可确定该IP地址的网络部分和主机部分，不需要子网掩码的辅助。

但为了使系统在对A、B、C这三种主类网进行了子网的划分，或者采用无类别的域间选路技术(Classless Inter-Domain Routing，CIDR)对网段进行汇总的情况下，也能对IP地址的网络及子网部分与主机部分作正确的区分，就必须依赖于子网掩码的帮助。

子网掩码使用与IP相同的编址格式，子网掩码为1的部分对应于IP地址的网络与子网部分，子网掩码为0的部分对应于IP地址的主机部分。将子网掩码和IP地址作“与”操作后，IP地址的主机部分将被丢弃，剩余的是网络地址和子网地址。例如，一个IP分组的目的IP地址为：10.2.2.1，若子网掩码为：255.255.255.0，与之作“与”运算得：10.2.2.0，则网络设备认为该IP地址的网络号与子网号为：10.2.2.0。

网关(Gateway)
在Internet中的网关一般是指用于连接两个或者两个以上网段的网络设备，通常使用路由器 (Router)作为网关。在TCP/IP网络体系中，网关的基本作用是根据目的IP地址的网络号与子网号，选择最佳的出口对IP分组进行转发，实现跨网段的数据通信。在Semester 1中只需要对网关的基本作用有所了解，在Semester 2中还将对路由器的工作机理和配置过程作详细的论述。

ARP协议(Address Resolution Protocol)
在以太网(Ethernet)中，一个网络设备要和另一个网络设备进行直接通信，除了知道目标设备的网络层逻辑地址(如IP地址)外，还要知道目标设备的第二层物理地址(MAC地址)。ARP协议的基本功能就是通过目标设备的IP地址，查询目标设备的MAC地址，以保证通信的顺利进行。

当一个网络设备需要和另一个网络设备通信时，它首先把目标设备的IP地址与自己的子网掩码进行“与”操作，以判断目标设备与自己是否位于同一网段内。如果目标设备在同一网段内，并且源设备没有获得与目标IP地址相对应的MAC地址信息，则源设备以第二层广播的形式(目标MAC地址为全1)发送 ARP请求报文，在ARP请求报文中包含了源设备与目标设备的IP地址。同一网段中的所有其他设备都可以收到并分析这个ARP请求报文，如果某设备发现报文中的目标IP地址与自己的IP地址相同，则它向源设备发回ARP响应报文，通过该报文使源设备获得目标设备的MAC地址信息。

如果目标设备与源设备不在同一网段，则源设备首先把IP分组发向自己的缺省网关(Default Gateway)，由缺省网关对该分组进行转发。如果源设备没有关于缺省网关的MAC信息，则它同样通过ARP协议获取缺省网关的MAC地址信息。

为了减少广播量，网络设备通过ARP表在缓存中保存IP与MAC地址的映射信息。在一次ARP的请求与响应过程中，通信双方都把对方的MAC地址与 IP地址的对应关系保存在各自的ARP表中，以在后续的通信中使用。ARP表使用老化机制，删除在一段时间内没有使用过的IP与MAC地址的映射关系。

实验设计
我们通过设计一个简单的实验来帮助学员更深入直观地理解上述三个知识点所涉及的基本概念与原理。在实验中，我们利用 ping命令来检验主机间能否进行正常的双向通信。在“ping”的过程中，源主机向目标主机发送ICMP的Echo Request报文，目标主机收到后，向源主机发回ICMP的Echo Reply报文，从而可以验证源与目标主机能否进行正确的双向通信。

A与B为实验用的PC机，使用Windows2000 Professional作操作系统。

实验方案：

步骤1：

设置两台主机的IP地址与子网掩码：
A: 10.2.2.2 255.255.254.0
B: 10.2.3.3 255.255.254.0
两台主机均不设置缺省网关。

用arp -d命令清除两台主机上的ARP表，然后在A与B上分别用ping命令与对方通信，在A与B上分别显示，
A: Reply from 10.2.3.3: bytes=32 time<10ms TTL=128
B: Reply from 10.2.2.2: bytes=32 time<10ms TTL=128
用arp -a命令可以在两台PC上分别看到对方的MAC地址。

分析：由于主机将各自通信目标的IP地址与自己的子网掩码相“与”后，发现目标主机与自己均位于同一网段(10.2.2.0)，因此通过ARP协议获得对方的MAC地址，从而实现在同一网段内网络设备间的双向通信。

步骤2：

将A的子网掩码改为：255.255.255.0，其他设置保持不变。

操作1：用arp -d命令清除两台主机上的ARP表，然后在A上ping B，在A上显示结果为：Destination host unreachable

用arp -a命令在两台PC上均不能看到对方的MAC地址。

分析1：A将目标设备的IP地址(10.2.3.3)和自己的子网掩码(255.255.255.0)相“与”得10.2.3.0，和自己不在同一网段(A所在网段为:10.2.2.0)，则A必须将该IP分组首先发向缺省网关。由于A的缺省网关没有配置，无法对分组进行正确发送，因此显示“目标主机不可到达”。

操作2：接着在B上ping A，在B上显示结果为：Request timed out 此时用arp -a命令可以在两台PC上分别看到对方的MAC地址。

分析2：B将目标设备的IP地址(10.2.2.2)和自己的子网掩码(255.255.254.0)相“与”，发现目标主机与自己均位于同一网段 (10.2.2.0)，因此，B通过ARP协议获得A的MAC地址，并可以正确地向A发送Echo Request报文。但由于A不能向B正确地发回Echo Reply报文(原因见分析1)，故B上显示ping的结果为“请求超时”。在该实验操作中，通过观察A与B的ARP表的变化，可以验证：在一次ARP的请求与响应过程中，通信双方就可以获知对方的MAC地址与IP地址的对应关系，并保存在各自的ARP表中。

步骤3：
在前面实验的基础上，把A的缺省网关设为：10.2.2.1，网关的子网掩码为：255.255.0.0。
在A与B上分别用ping命令与对方通信，各自的显示结果为：
A: Reply from 10.2.3.3: bytes=32 time<10ms TTL=128
B: Reply from 10.2.2.2: bytes=32 time<10ms TTL=127

在A与B上分别用tracert命令追踪数据的传输路径，结果分别为：

A: tracert 10.2.3.3
Tracing route to 10.2.3.3 over a maximum of 30 hops:
1 <10 ms <10 ms <10 ms 10.2.2.1
2 <10 ms <10 ms <10 ms 10.2.3.3
Trace complete.

B: tracert 10.2.2.2
Tracing route to 10.2.2.2 over a maximum of 30 hops:
1 <10 ms <10 ms <10 ms 10.2.2.2
Trace complete.

分析：如步骤2中的分析，由于A认为B与其不在同一个网段，故从A发向B的报文需要经过网关转发；而B认为A与其在同一个网段，故B不需要经过网关直接向A发送报文，从而可以观察到A与B双向通信时传输路径的不对称性。由于ping命令结果显示的是从目标主机返回的Echo Reply报文的TTL的值，而B收到从A返回的Echo Reply报文经过了网关的转发，所以在B中显示该IP报文的TTL值降为了127(从A发出的IP分组的TTL的初始值为128，每经过一个网关， TTL值减1)。

步骤4：
用arp -d命令清除A中的ARP表，在A上ping一台外网段的主机，如中大的WWW Server(202.116.64.8)，再用arp -a可观察到A的ARP表中只有缺省网关的MAC地址信息。

分析：当源主机要和外网段的主机进行通信时，它并不需要获取远程主机的MAC地址，而是把IP分组发向缺省网关，由网关IP分组的完成转发过程。如果源主机没有缺省网关MAC地址的缓存记录，则它会通过ARP协议获取网关的MAC地址，因此在A的ARP表中只观察到网关的MAC地址记录，而观察不到远程主机的MAC地址。
转自51CTO.com

Tuesday, September 25, 2007

[learning Python] built-in objects preview

Built-in objects preview

Object type
Example literals/creation

Numbers
3.1415, 1234, 999L, 3+4j

Strings
'spam', "guido's"

Lists
[1, [2, 'three'], 4]

Dictionaries
{'food': 'spam', 'taste': 'yum'}

Tuples
(1,'spam', 4, 'U')

Files
text = open('eggs', 'r').read( )

Perl内置数据类型

一、标量scalar是 Perl 中最简单的数据类型。大多数的标量是数字(如 255 或 3.25e20)或者字符串("Hello World!")
Perl 不同于其它的一些语言 ,它没有 Boolean 类型。它利用如下几条规则:
1. 如果值为数字,0 是 false;其余为真
2. 如果值为字符串,则空串( ‘’)为 false;其余为真
3. 如果值的类型既不是数字又不是字符串,则将其转换为数字或字符串后再利用上述规则
4. 这些规则中有一个特殊的地方。由于字符串‘0’和数字 0 有相同的标量值,Perl 将它们相同看待。也就是说字符串‘0’是唯一
一个非空但值为 0 的串

二、列表list是标量的有序集。数组是包含列表的变量。在 Perl 中这个两个术语是可以互换的。但严格意义上讲,列表是指数据, 而数组是其变量名。可以有一些值(列表)但不属于数组;但每一个数组标量都有一个列表,虽然其可以为空。
列表中每一个元素都是一个独立的标量值。这些值是有顺序的,也就是说,这些值从开头到最后一个元素有一个固定的序列。数组或者列表中的元素是编了号的,其索引从整数 0 开始,依次增一,因此数组或者列表第一个元素的索引为 0。
数组是由括号括起来并且其元素由逗号分隔开的列表。这些值组成了数组的元素: (1,2 ,3) # 含有 1 ,2,3 的列表。

三、哈希hash是一种数据结构,和数组类似,可以将值存放到其中,或者从中取回值。但是,和数组不同的是,其索引不是数字而是任意的唯一的字符串，称作key。

Rails migration plugins from Mr.err

sexy db migration:


class UpdateYourFamily < ActiveRecord::Migration
  create_table :updates do |t|
    t.column :user_id,  :integer
    t.column :group_id, :integer
    t.column :body,     :text
    t.column :type,     :string

    t.column :created_at, :datetime
    t.column :updated_at, :datetime 
  end

  def self.down
    drop_table :updates
  end
end

Into this:


class UpdateYourFamily < ActiveRecord::Migration
  create_table :updates do
    foreign_key :user
    foreign_key :group

    text   :body
    string :type

    timestamps!
  end

  def self.down
    drop_table :updates
  end
end

Using this:
SVN:
$ ./script/plugin install \ svn://errtheblog.com/svn/plugins/sexy_migrations

auto db migration, change this:


ActiveRecord::Schema.define(:version => 1) do
  create_table :posts do |t|
    t.string  :title
    t.text    :body
  end
end

into


ActiveRecord::Schema.define(:version => 1) do
  create_table :posts do |t|
    t.string  :title
    t.text    :body
    t.integer :published
  end

  create_table :comments do |t|
    t.string  :name, :url
    t.text    :body
    t.integer :post_id
  end
end

and run:
$ rake db:auto:migrate

it’ll execute the following:

-- add_column("posts", :published, :integer)
-> 0.0096s
-- create_table(:comments)
-> 0.0072s

Pretty slick. Run the task again and nothing will happen, just like regular migrations, but change the file and the plugin will do its best to figure out what you’ve done.
and support index:


ActiveRecord::Schema.define(:version => 1) do
  create_table :posts do |t|
    t.string  :title
    t.text    :body
    t.integer :published
  end

  add_index :posts, :published

  create_table :comments do |t|
    t.string  :name, :url
    t.text    :body
    t.integer :post_id
  end
end

Followed by:

$ rake db:auto:migrate
-- add_index("posts", ["published"])
-> 0.0216s


ActiveRecord::Schema.define(:version => 1) do
  create_table :posts do |t|
    t.string  :title
    t.text    :body
    t.integer :published
  end

  # add_index :posts, :published

  create_table :comments do |t|
    t.string  :name, :url
    t.text    :body
    t.integer :post_id
  end
end

And auto-migrate again:

$ rake db:auto:migrate
-- remove_index("posts", {:name=>"index_posts_on_published"})
-> 0.0187s

Check it Out:
Warehouse: http://plugins.require.errtheblog.com/browser/auto_migrations
SVN: svn://errtheblog.com/svn/plugins/auto_migrations

Monday, September 24, 2007

Javascript 检测中文字符集的正则表达式

// 校验中文字符
alert(/[\u4E00-\u9FA5]+/.test('中文'));

网上看到的一个Ruby风格的JS源码


Object.prototype.clone = function(deepClone) {
  var result = new this.constructor()
  for (var property in this) {
    if (deepClone && typeof(this[property]) == 'object') {
      result[property] = this[property].clone(deepClone)
    } else {
      result[property] = this[property]
    }
  }
  return(result)
}

Object.prototype.extend = function(other) {
  if (!this.mixins) this.mixins = []
  this.mixins.push(other)
  for (var property in other)
    if (!this.hasOwnProperty(property))
      this[property] = other[property]
}

Object.prototype.cmp = function(other) {
  if (this < other) return(-1)
  if (this > other) return(+1)
  return(0)
}

Object.prototype.valuesAt = function() {
  var obj = this
  return(arguments.toArray().map(function(index) {
    return(obj[index])
  }))
}

Object.prototype.toArray = function() {
  if (!this.length) throw("Can't convert")
  var result = []
  for (var i = 0; i < this.length; i++)
    result.push(this[i])
  return(result)
}

Object.prototype.hash = function() {
  return(this.toSource().hash())
}

Object.prototype.instanceOf = function(klass) {
  return(this.constructor == klass)
}

Object.prototype.isA = Object.prototype.kindOf = function(klass) {
  if (this.instanceOf(klass)) return(true)
  if (this["mixins"] != undefined && this.mixins.includes(klass))
    return(true)
  return(false)
}

Object.prototype.methods = function() {
  var result = []
  for (var property in this)
    if (typeof(this[property]) == "function")
      result.push(property)
  return(result)
}

Object.prototype.respondTo = function(method) {
  return(this.methods().includes(method))
}

Object.prototype.send = function(method) {
  var rest = arguments.toArray().last(-1)
  if (!this.respondTo(method)) throw("undefined method")
  return(this[method].apply(this, rest))
}

Object.prototype.instanceEval = function(code) {
  if (code.isA(Function))
    return(code.apply(this))
  else
    return(eval(code.toString()))
}

Number.prototype.times = function(block) {
  for (var i = 0; i < this; i++) block(i)
}

Number.prototype.upto = function(other, block) {
  for (var i = this; i <= other; i++) block(i)
}

Number.prototype.downto = function(other, block) {
  for (var i = this; i >= other; i--) block(i)
}

Number.prototype.towards = function(other, block) {
  var step = this.cmp(other)
  for (var i = this; i !== other - step; i -= step)
    block(i) 
}

Number.prototype.succ = function() { return(this + 1) }
Number.prototype.pred = function() { return(this - 1) }

Number.prototype.chr = function() { return(String.fromCharCode(this)) }

enumerable = new Object()
enumerable.eachWindow = function(window, block) {
  if (!window.isA(Range)) window = range(0, window)
  elements = [], pushed = 0
  this.each(function(item, index) {
    elements.push(item)
    pushed += 1
    if (pushed % window.rend == 0) {
      start = [0, window.start - window.rend + pushed].max()
      end = [0, window.rend + pushed].max()
      block(elements.fetch(xrange(start, end)), index)
    }
  })
}

enumerable.collect = enumerable.map = function(block) {
  var result = []
  this.each(function(item, index) {
    result.push(block(item, index))
  })
  return(result)
}

enumerable.toArray = enumerable.entries = function() {
  return(this.map(function(item) { return(item) }))
}

enumerable.inject = function(firstArg) {
  var state, block, first = true
  if (arguments.length == 1) {
    block = firstArg
  } else {
    state = firstArg
    block = arguments[1]
  }
  this.each(function(item, index) {
    if (first && typeof(state) == "undefined")
      state = item, first = false
    else
      state = block(state, item, index)
  })
  return(state)
}

enumerable.find = enumerable.detect = function(block) {
  var result, done
  this.each(function(item, index) {
    if (!done && block(item, index)) {
      result = item
      done = true
    }
  })
  return(result)
}

enumerable.findAll = enumerable.select = function(block) {
  return(this.inject([], function(result, item, index) {
    return(block(item, index) ? result.add(item) : result)
  }))
}

enumerable.grep = function(obj) {
  return(this.findAll(function(item) {
    return(obj.test(item))
  }))
}

enumerable.reject = function(block) {
  return(this.select(function(item, index) {
    return(!block(item, index))
  }))
}

enumerable.compact = function() {
  return(this.select(function(item) {
    return(typeof(item) != "undefined")
  }))
}

enumerable.nitems = function() { return(this.compact().length) }

enumerable.sortBy = function(block) {
  return(this.map(function(item, index) {
    return([block(item, index), item])
  }).sort(function(a, b) {
    return(a[0].cmp(b[0]))
  }).map(function(item) {
    return(item[1])
  }))
}

enumerable.all = function(block) {
  return(this.findAll(block).length == this.length)
}

enumerable.any = function(block) {
  return(typeof(this.find(block)) != "undefined")
}

enumerable.includes = function(obj) {
  return(this.any(function(item) {
    return(item === obj)
  }))
}

enumerable.index = function(obj) {
  var result
  this.find(function(item, index) {
    if (obj == item) {
      result = index
      return(true)
    } else {
      return(false)
    }
  })
  return(result)
}

enumerable.uniq = function() {
  return(this.inject([], function(result, item) {
    return(result.includes(item) ? result : result.add(item))
  }))
}

enumerable.max = function(block) {
  if (!block) block = function(a, b) { return(a.cmp(b)) }
  return(this.sort(block).last())
}

enumerable.min = function(block) {
  if (!block) block = function(a, b) { return(a.cmp(b)) }
  return(this.sort(block).first())
}

enumerable.partition = function(block) {
  var positives = [], negatives = []
  this.each(function(item, index) {
    if (block(item, index))
      positives.push(item)
    else
      negatives.push(item)
  })
  return([positives, negatives])
}

enumerable.zip = function() {
  var ary = arguments.toArray()
  ary.unshift(this)
  return(ary.transpose())
}

enumerable.flatten = function(depth) {
  if (depth == undefined) depth = -1
  if (!depth) return(this)
  return(this.inject([], function(result, item) {
    var flatItem = item.respondTo("flatten") ? item.flatten(depth - 1) : [item]
    return(result.merge(flatItem))
  }))
}

Array.fromObject = function(obj) {
  if (!obj.length) throw("Can't convert")
  var result = []
  for (var i = 0; i < obj.length; i++)
    result.push(obj[i])
  return(result)
}

Array.prototype.transpose = function() {
  var result, length = -1
  this.each(function(item, index) {
    if (length < 0) { /* first element */
      length = item.length
      result = Array.withLength(length, function() {
        return(new Array(this.length))
      })
    } else if (length != item.length) {
      throw("Element sizes differ")
    }
    item.each(function(iitem, iindex) {
      result[iindex][index] = iitem
    })
  })
  return(result)
}

Array.withLength = function(length, fallback) {
  var result = [null].mul(length)
  result.fill(fallback)
  return(result)
}

Array.prototype.each = function(block) {
  for (var index = 0; index < this.length; ++index) {
    var item = this[index]
    block(item, index)
  }
  return(this)
}
Array.prototype.extend(enumerable)

Array.prototype.isEmpty = function() { return(this.length == 0) }

Array.prototype.at = Array.prototype.fetch = function(index, length) {
  if (index.isA(Range)) {
    var end = index.rend + (index.rend < 0 ? this.length : 0)
    index = index.start
    length = end - index + 1
  }
  if (length == undefined) length = 1
  if (index < 0) index += this.length
  var result = this.slice(index, index + length)
  return(result.length == 1 ? result[0] : result)
}

Array.prototype.first = function(amount) {
  if (amount == undefined) amount = 1
  return(this.at(xrange(0, amount)))
}

Array.prototype.last = function(amount) {
  if (amount == undefined) amount = 1
  return(this.at(range(-amount, -1)))
}

Array.prototype.store = function(index) {
  var length = 1, obj
  arguments = arguments.toArray()
  arguments.shift()
  if (arguments.length == 2)
    length = arguments.shift()
  obj = arguments.shift()
  if (!obj.isA(Array)) obj = [obj]
  if (index.isA(Range)) {
    var end = index.rend + (index.rend < 0 ? this.length : 0)
    index = index.start
    length = end - index + 1
  }
  if (index < 0) index += this.length
  this.replace(this.slice(0, index).merge(obj).merge(this.slice(index + length)))
  return(this)
}

Array.prototype.insert = function(index) {
  var values = arguments.toArray().last(-1)
  if (index < 0) index += this.length + 1
  return(this.store(index, 0, values))
}

Array.prototype.update = function(other) {
  var obj = this
  other.each(function(item) { obj.push(item) })
  return(obj)
}

Array.prototype.merge = Array.prototype.concat
Array.prototype.add = function(item) { return(this.merge([item])) }

Array.prototype.clear = function() {
  var obj = this
  this.length.times(function(index) {
    delete obj[index]
  })
  this.length = 0
}

Array.prototype.replace = function(obj) {
  this.clear()
  this.update(obj)
}

Array.prototype.mul = function(count) {
  var result = []
  var obj = this
  count.times(function() { result = result.merge(obj) })
  return(result)
}

Array.prototype.fill = function(value) {
  var old_length = this.length
  var obj = this
  this.clear()
  var block
  if (typeof(value) != "function")
    block = function() { return(value) }
  else
    block = value

  old_length.times(function(i) {
    obj.push(block(i))
  })
}

Array.prototype.removeAt = function(targetIndex) {
  var result = this[targetIndex]
  var newArray = this.reject(function(item, index) {
    return(index == targetIndex)
  })
  this.replace(newArray)
  return(result)
}

Array.prototype.remove = function(obj) {
  this.removeAt(this.index(obj))
}

Array.prototype.removeIf = function(block) {
  this.replace(this.reject(block))
}

function Range(start, end, excludeEnd) {
  this.begin = this.start = start
  this.end = end
  this.excludeEnd = excludeEnd
  this.rend = excludeEnd ? end.pred() : end
  this.length = this.toArray().length
}

function range(start, end) { return(new Range(start, end)) }
function xrange(start, end) { return(new Range(start, end, true)) }

Range.prototype.toString = function() {
  return("" + this.start + (this.excludeEnd ? "..." : "..") + this.end)
}

Range.prototype.each = function(block) {
  var index = 0
  this.start.towards(this.rend, function(i) {return(block(i, index++))})
}
Range.prototype.extend(enumerable)

Range.prototype.includes = function(item) {
  return(this.start.cmp(item) == -1 && this.rend.cmp(item) == +1)
}

function Hash(defaultBlock) {
  this.defaultBlock = defaultBlock
  this.keys = []
  this.values = []
  this.length = 0
}

Hash.fromArray = function(array) {
  var result = new Hash()
  array.each(function(item) {
    var key = item[0], value = item[1]
    result.store(key, value)
  })
  return(result)
}

Hash.prototype.at = Hash.prototype.fetch = function(key, block) {
  var result
  if (this.hasKey(key))
    result = this["item_" + key.hash()]
  else {
    if (block) 
      result = block(key)
    else
      result = defaultBlock(key)
  }
  return(result)
}

Hash.prototype.store = function(key, value) {
  this.keys.push(key)
  this.values.push(value)
  this.length++
  return(this["item_" + key.hash()] = value)
}

Hash.prototype.toA = function() {
  return(this.keys.zip(this.values))
}

Hash.prototype.isEmpty = function() {
  return(this.length == 0)
}

Hash.prototype.has = Hash.prototype.includes = Hash.prototype.hasKey = function(key) {
  return(hasOwnProperty("item_" + key.hash()))
}

Hash.prototype.hasValue = function(value) {
  return(this.values.includes(value))
}

Hash.prototype.each = function(block) {
  this.toA().each(function (pair) {
    return(block(pair[1], pair[0]))
  })
}

Hash.prototype.extend(enumerable)

Hash.prototype.merge = function(other) {
  other.each(function(value, key) {
    this.store(key, value)
  })
}

Hash.prototype.remove = function(key) {
  var valueIndex = this.keys.index(key)
  var value = this.values[valueIndex]
  this.keys.remove(key)
  this.values.removeAt(valueIndex)
  delete(this["item_" + key.hash()])
  this.length--
  return([key, value])
}

Hash.prototype.removeIf = function(block) {
  this.each(function(value, key) {
    if (block(value, key))
      this.remove(key)
  })
}

Hash.prototype.shift = function() {
  return(this.remove(this.keys[0]))
}

Hash.prototype.clear = function() {
  var obj = this
  this.length.times(function() {obj.shift()})
}

Hash.prototype.replace = function(obj) {
  this.clear()
  this.merge(obj)
}

Hash.prototype.invert = function() {
  return(Hash.fromArray(this.map(function(value, key) {
    return([value, key])
  })))
}

Hash.prototype.rehash = function() {
  var result = new Hash(this.defaultBlock)
  this.each(function(value, key) {
    result.store(key, value)
  })
  this.replace(result)
}

function MatchData(matches, str, pos) {
  this.matches = matches, this.string = str
  this.begin = this.position = pos
  this.match = matches[0]
  this.captures = matches.slice(1)
  this.end = pos + this.match.length
  this.length = matches.length
  this.preMatch = str.substr(0, pos)
  this.postMatch = str.substr(this.end)
}

MatchData.prototype.toString = function() { return(this.match) }
MatchData.prototype.at = function(index) {
  return(this.matches.at(index))
}
MatchData.prototype.toArray = function() { return(this.matches) }

RegExp.prototype.match = function(str) {
  var matches
  if (matches = this.exec(str)) {
    var pos = str.search(this)
    return(new MatchData(matches, str, pos))
  }
}

String.prototype.clone = function() { return(new String(this)) }

String.prototype.each = function(block) {
  this.split("\n").each(block)
}

String.prototype.extend(enumerable)

String.prototype.toArray = function() { return(this.split("\n")) }

String.prototype.towards = function(other, block) {
  var item = this
  while (item.cmp(other) <= 0) {
    block(item)
    item = item.succ()
  }
}

String.prototype.hash = function() {
  var result = 0
  this.split("").each(function(item) {
    result += item.charCodeAt(0)
    result += (result << 10)
    result ^= (result >> 6)
  })
  result += (result << 3)
  result ^= (result >> 11)
  result += (result << 15)
  return(result)
}

String.prototype.chars = function() { return(this.split("")) }

String.prototype.at = String.prototype.fetch = function(index, length) {
  if (index.isA(Range)) {
    var end = index.rend + (index.rend < 0 ? this.length : 0)
    index = index.start
    length = end - index + 1
  }
  if (length == undefined) length = 1
  if (index < 0) index += this.length
  return(this.substr(index, length))
}

String.prototype.store = String.prototype.change = function(index) {
  var length = 1, obj
  arguments = arguments.toArray()
  arguments.shift()
  if (arguments.length == 2)
    length = arguments.shift()
  obj = arguments.shift()
  if (index.isA(Range)) {
    var end = index.rend + (index.rend < 0 ? this.length : 0)
    index = index.start
    length = end - index + 1
  }
  if (index < 0) index += this.length
  return(this.substr(0, index) + obj + this.substr(index + length))
}

String.prototype.reverse = function() {
  return(this.split("").reverse().join(""))
}

String.prototype.scan = function(pattern) {
  var str = this, result = [], oldPos = -1, match, offset = 0
  while (match = pattern.match(str)) {
    if (match.end == match.begin)
      throw("Can't have null length matches with scan()")
    var newMatch = new MatchData(match.matches, match.string, match.position + offset)
    result.push(newMatch)
    str = match.postMatch
    offset += match.toString().length
  }
  return(result)
}

String.prototype.sub = function(what, by, global) {
  var block = typeof(by) == "function" ? by : function() { return(by) }
  var matches = this.scan(what), result = this, offset = 0
  if (!global && !by.global) matches = matches.slice(0, 1)
  matches.each (function(match) {
    var replacement = block(match)
    offset += replacement.length - match.toString().length
    result = result.change(match.begin + offset, match.toString().length, replacement)
  })
  return(result)
}
String.prototype.gsub = function(what, by) { return(this.sub(what, by, true)) }

String.prototype.tr = function(from, to) {
  var map = Hash.fromArray(from.chars().zip(to.chars()))
  return(this.chars().map(function(chr) {
    return(map.includes(chr) ? map.fetch(chr) : chr)
  }).join(""))
}

String.prototype.mul = function(other) {
  var result = "", str = this
  other.times(function() { result += str })
  return(result)
}

String.prototype.isUpcase = function() { return(this == this.upcase()) }
String.prototype.isDowncase = function() { return(this == this.downcase()) }
String.prototype.isCapitalized = function() {
  return(this.fetch(0).isUpcase() && this.fetch(range(1, -1)).isDowncase())
}
String.prototype.upcase = String.prototype.toUpperCase
String.prototype.downcase = String.prototype.toLowerCase
String.prototype.capitalize = function() {
  return(this.fetch(0).upcase() + this.fetch(range(1, -1)).downcase())
}
String.prototype.swapcase = function() {
  return(this.chars().map(function(chr) {
    if (chr.isUpcase()) return(chr.downcase())
    if (chr.isDowncase()) return(chr.upcase())
    return(chr)
  }).join(""))
}
String.prototype.ord = function() { return(this.charCodeAt(0)) }

String.prototype.isEmpty = function() { return(this.length == 0) }

String.prototype.succ = function() {
  if (this.isEmpty()) return(this)
  /* numerics */
  if (/^\d+$/.test(this))
    return((Number(this) + 1).toString())
  /* just one character */
  if (this.length == 1) {
    /* letters */
    if (/[A-Za-z]/.test(this)) {
      var lastLetter = this.isUpcase() ? 'Z' : 'z'
      var firstLetter = this.isUpcase() ? 'A' : 'a'
      return((this == lastLetter) ? firstLetter.mul(2) : (this.ord() + 1).chr())
    } else {
      return(this == (-1).chr() ? 0.0.chr().mul(2) : (this.ord() + 1).chr())
    }
  /* multiple characters */
  } else {
    var result = this
    for (var index = this.length; index >= 0; index--) {
      var chr = this.at(index)
      if (chr.succ().length == 1 || index == 0)
        return(result.change(index, chr.succ()))
      else
        result = result.change(index, chr.succ().at(-1))
    }
  }
}

String.prototype.ljust = function(length, fill) {
  if (!fill) fill = " "
  if (fill.length > 1) throw("TODO: Make fills with length > 1 work.")
  return(this + fill.mul(length / fill.length - this.length))
}

MySQL::ERROR Client does not support authentication protocol requested by server; consider upgrading MySQL client

将Linux下的MySQL5.0.37的data文件夹移到Windows平台下MySQL5.1.20后,Rails程序报如下错误：
Client does not support authentication protocol requested by server; consider upgrading MySQL client
这个错误以前在用MySQL从老版本转到MySQL4.1.10时候碰到过，不知到5.1.20还碰到这个问题，还是用原来的方法修改下数据中mysql那个库中的用户密码即可：
SET PASSWORD FOR user@localhost = OLD_PASSWORD('password');

Sunday, September 23, 2007

Perl中对@array和%hash的排序


#! /usr/bin/perl -w
use strict;
 
# 如果在最终结果中$a 出现在$b 之前,则其排序子程序返回-1。如果$b 出现在$a 之前,则返回 1。 
# 如果$a 和$b 的顺序无关紧要,则子程序返回 0。为什么它无关紧要呢?也许你正在做一个大小写无关的排序,而这两个字 
# 符串是 fred 和 Fred 。也许你正在做一个数字排序,而这两个元素相等。
sub by_number {
 if ($a > $b) {-1} elsif ($a < $b) {1} else {0}
}

my @nums = (1, 6, 2, 7, 3, 8, 4, 9, 5);
@nums = sort by_number @nums;
print "@nums\n";

# 针对本例,我们使用太空船(spaceship)符号(<=>)。这个操作符比较两个数字,按照数字将其排序,并返回-1, 0, 1。
sub by_numerically { $a <=> $b };
@nums = sort by_numerically @nums;
print "@nums\n";

sub case_insenstive { "\L$a" cmp "\L$b"};
print sort case_insenstive ('Last ', 'First ', 'second ', 'third ');
print "\n\n";

my %score = (
 "barney" => 195, 
 "fred" => 205, 
 "dino" => 30, 
 "bam-bamm" => 195, 
);

sub by_score_and_name {
 $score{$a} <=> $score{$b}
  or
 $a cmp $b;
}
# or的优先级低，在前面太空船'<=>'比较结果后，如果返回的为0，则计算后面的比较。
my @winners = sort by_score_and_name keys %score;
print "@winners\n";

Array的排序方法sort与Javascript, Ruby基本一致。Javascript sort调用方法举例如下：


 function by_number(a, b) {
  if (a > b) {
   return 1;
  } else if (a < b) {
   return -1;
  } else {
   return 0;
  }
 }
 
 function by_number_reverse(a, b) {
  if (a > b) {
   return -1;
  } else if (a < b) {
   return 1;
  } else {
   return 0;
  }
 }
 var myArray = [2, 4, 2, 17, 50, 8];
 alert( myArray.sort() ); 
 alert( myArray.sort(by_number));
 alert( myArray.sort(by_number_reverse));

Friday, September 21, 2007

.(period) 在Javascript/Ruby/Perl/PHP中的区别

在Ruby/Perl/PHP中：
. (period) [^\n] 匹配除换行符(\n)之外所有字符
在Javascript中：
. (period) [^\n\r] Any character except new line and carriage return

在Ruby/Javascript中：
可以用m这个选项使得.period匹配换行符\n，不过不匹配Javascript的\r。
在PHP/Perl中：
可以用s这个选项使得.period匹配换行符\n。在PHP/Perl中的m选项是使得正则表达式里的^和$能匹配字符串的多行。

关于Shell中的模式(glob)与正则表达式(Regular Expression)

不要将正则表达式和 shell 中的文件名匹配模式,globs 混淆了。通常 glob 是指,在 Unix shell 下输入*.pm 将匹配所有结尾为.pm 的文件名，globs 有时也被称作模式。但严重的问题是,某些面向初级用户的书籍(可能是菜鸟写得)将 globs 叫做“正则表达式”,这绝对是错误的。
Ruby中Dir Class 中有个方法[]，说明为：Equivalent to calling Dir.glob(glob_pattern, 0)，而在Dir.glob方法中则有二种用法：
Dir.glob( glob_pattern,

[1071]Mysql::Error: Specified key was too long; max key length is 1000 bytes

在Windows XP下用Mysql5.1.20创建一个表索引碰到这个错误，错误号1071，表为GBK编码，MyISAM引擎。Google了一下，这个在Mysql5.2.0之前是个Bug，改用默认的Latin1字符集就可以避过这个问题，未验证，但是在CentOS 5.0下安装的Mysql5.0.45这个错误并不会发生，具体跟操作系统还有些关系。
错误原因说明及解决方法如下：
建立索引时，数据库计算key的长度是累加所有Index用到的字段的char长度后再按下面比例乘起来不能超过限定的key长度1000：
latin1 = 1 byte = 1 character
uft8 = 3 byte = 1 character
gbk = 2 byte = 1 character
举例能看得更明白些，以GBK为例：
CREATE UNIQUE INDEX `unique_record` ON reports (`report_name`, `report_client`, `report_city`);
其中report_name varchar(200), report_client varchar(200), report_city varchar(200)
(200 + 200 +200) * 2 = 1200 > 1000，所有就会报1071错误，只要将report_city改为varchar(100)那么索引就能成功建立。
如果表是UTF8字符集，那索引还是建立不了。

Thursday, September 20, 2007

learning Perl's subroutine


#! /usr/bin/perl -w

sub marine {
 $n += 1;
 print "subroutine marine $n \n";
}

&marine;
&marine;
$x = &marine;
print $x;
print "\n\n";

$x = 3;
$y = 4;

sub multiply {
 $x * $y;
}

print &multiply;
print "\n";
$z = 5 * &multiply;
print $z;

@_ = (4, 5, 6, 8, 7);
sub add {
 $_[0] + $_[1];
}

print "\n";
print &add(1, 2, 3);
print "\n";
print add(1, 2, 3);
print "\n";
print &add(1);
print "\n";
print add(1);
print "\n";
print &add; # 除 非 调 用 的 子 程 序 前 有 & 而 后 面 没 有 括 号 ( 或 者 没 有 参 数 ) , 此 时 @_ 从 此 调 用 者 的 上 下 文 ( context )  得 到 。 这 通 常 不 是 个 好 主 意 , 但 有 时 很 有 用 。
print "\n";
print add; 
print "\n\n";

sub max {
 my($m) = shift @_;
 foreach (@_) {
  my $cur = $_; # can use "my" in foreach
  print $cur;
  print "\n";
  if ($_ > $m) {$m = $_}
 }
 $m;
}
print &max;

The Elements of JavaScript Style (Part One)

Part One

Douglas Crockford

The Department of Style

2005-09-19

Programming is difficult. At its core, it is about managing complexity. Computer
programs are the most complex things that humans make. Quality is a illusive
and elusive.

Good architecture is necessary to give programs enough structure to be able to grow
large without collapsing into a puddle of confusion, but the ways in which we
express the details of a program are equally important. A program's true nature
can be concealed by sloppy coding. Only when the presentation of a program is
clear can we have any hope of reasoning correctly about its efficiency, or security,
or correctness.

The classic work in literary style is William Strunk's The
Elements of Style, a skinny manual on writing in English, with advice on
usage, composition, and form. The idea of style was applied unsuccessfully to
programming in Kreitzberg and Shneiderman's The
Elements of FORTRAN Style in 1972, and then brilliantly in Kernighan and
Plauger's The Elements of Programming Style in 1978:

Good programming cannot be taught by preaching generalities. The way to learn
to program well is by seeing, over and over, how real programs can be improved
by the application of a few principles of good practice and a little common
sense.

They took programs culled from other programming textbooks, which they criticized
and improved.

When we talk of style here, we are not talking about fads and fashions,
nor are we talking about CSS or conventions of layout or typography. We are
talking about timeless qualities of expression which can substantially increase
the value of a codebase. For companies whose valuations are are inextricably
bound to their codebases, style should be a vital concern.

We use many programming languages, but in a way, JavaScript is the most important.
It is the language of the browser. When people come to our site, they are (perhaps
unknowingly) inviting our JavaScript programs to execute in their machines.
We have a special obligation to make those programs good.

There are no good texts on JavaScript
programming. Most of the people on the web who are producing JavaScript programs
learned it by copying really bad examples from bad books, bad websites, and
bad tools. We have
an amazingly good community of JavaScript programmers here, but still we can
benefit from better practice of style.

To demonstrate this, I will be taking programs from our public website, and
showing how they can be improved. It is not my intention to embarrass anyone.
My intention is only to show the value of style by example. I will be revealing
no secrets: I will be showing you what we are already transmitting to everyone
in the world.

The following examples were extracted from www.yahoo.com
on 2005-09-19.

<script language=javascript><!--
     lck='',
     sss=1127143538,
     ylp='p.gif?t=1127143538&_ylp=A0Je5ipy2C5D54AAwVX1cSkA',
     _lcs='';
--></script>

This script block uses the language attribute. This was a feature
that was introduced by Microsoft in order to support VBScript. Netscape then
adopted it to support its own nonstandard deviations. W3C did not adopt the
language attribute, favoring instead a type attribute
which takes a MIME type. Unfortunately, the MIME type was not standardized,
so it is sometimes "text/javascript" or "application/ecmascript"
or something else. Fortunately, all browsers will always choose JavaScript as
the default programming language, so it is always best to simply write <script>.
It is smallest, and it works on the most browsers.

The use of HTML comments in scripts dates further back to a transitional problem
between Netscape Navigator and Netscape Navigator 2. The latter introduced the
<script> tag. However, users of the former would see the
script as text because of the HTML convention that unrecognized markup is ignored.
The <!-- comment hack stopped being necessary by the time Netscape
Navigator 3 came out. It certainly is not needed now. It is ugly and a waste
of space.

The comma operator was borrowed, like much of JavaScript's syntax, from C.
The comma operator takes two values and returns the second one. Its presence
in the language definition tends to mask certain coding errors, so compilers
tend to be blind to some mistakes. It is best to avoid the comma operator, and
use the semicolon statement separator instead.

In this case, we are defining some global variables. JavaScript, when assigning
to an unknown variable, creates a new global variable instead of generating
an error. This was, in hindsight, a mistake. It is best to avoid mistakes, even
when they are standard mistakes. We should be explicit in declaring the variables.
It will cost us 4 characters, but it is the right thing to do.


<script>

var lck = '3ek6b0i2he2a5eh3/o',
    sss = 1126894256,
    ylp = 'p.gif?t=1126894256&_ylp=A0Je5iOwCitDw2YBX331cSkA',
    _lcs = '94040';
</script>

From that we can derive this principle:

Avoid archaic constructions.

The next example looks at a cookie class constructor. It creates an object
having a get method and a set method.

function yg_cookie() {
    this.get = function (n) {
        var s,
            e,  
            v = '',
            c = ' ' + document.cookie + ';';
        if ((s = c.indexOf((' ' + n + '='))) >= 0) {
            if ((e = c.indexOf(';',s)) == -1) 
                e = c.length;
            s += n.length + 2;
            v = unescape(c.substring(s, e));
        }
        return (v);
    }
    this.set = function (n,v,e) {
        document.cookie = n + "=" + escape(v) + 
            ";expires=" + (new Date(e * 1000)).toGMTString() + 
            ";path=/" + ";domain=www.yahoo.com";
    }
}
var _yc = new yg_cookie();

JavaScript's if statement is similar to C's: it can take statements
or blocks. The problem with using statements is that a common error is very
difficult to detect. It is better to write

if ((e = c.indexOf(';', s)) == -1) 
    e = c.length;

if ((e = c.indexOf(';', s)) == -1) {
    e = c.length;
}

The use of blocks avoids situations like this:

if ((e = c.indexOf(';', s)) == -1) 
    e = c.length;
    s += n.length + 2;

It might appear that s is only incremented when indexOf
returns -1, but this is not the case. Bugs like that can be very
expensive to find, but can be inexpensively avoided by always using braces to
indicate structure.

Always use blocks in structured statements.

Another bad habit that JavaScript inherited from C is the assignment expression.
It appears to streamline code, but it can make control flow more difficult to
understand. The get method gets clearer if we separate the computation
of s and e from their uses.

this.get = function (n) {
    var v = '',
        c = ' ' + document.cookie + ';',
        s = c.indexOf((' ' + n + '=')),
        e = c.indexOf(';', s);
    if (s >= 0) {
        if (e == -1) {
            e = c.length;
        }
        s += n.length + 2;
        v = unescape(c.substring(s, e));
    }
    return (v);
}

We can now see that there are excess parens around the argument to indexOf
where s is computed. (There are also unnecessary parens in the
return statement.) But more importantly, it is easier to see what
the purpose of if (e == -1) is: If a final semicolon is not found
in the cookie, then assume that the cookie ends at the end of the string. However,
when we computed c, we appended a semicolon to the cookie, which
guarantees that the condition the if is anticipating will never
happen. So we can remove the if.

Avoid assignment expressions.

When a function is assigned to a value, as inthis.get = function (n) { ... } it should end with a semicolon just like all assignment
statements.

function yg_cookie() {
    this.get = function (n) {
        var v = '',
            c = ' ' + document.cookie + ';',
            s = c.indexOf((' ' + n + '='));
        if (s >= 0) {
            s += n.length + 2;
            v = unescape(c.substring(s, c.indexOf(';', s)));
        }
        return v;
    };
    this.set = function (n,v,e) {
        document.cookie = n + "=" + escape(v) + 
            ";expires=" + (new Date(e * 1000)).toGMTString() + 
            ";path=/" + ";domain=www.yahoo.com";
    };
}
var _yc = new yg_cookie();

Finally, we see that yg_cookie is a constructor that produces
a single stateless object. We do not need a constructor function at all. We
can simply make an empty object and augment it by assigning the methods to it.

var _yc = new Object();
_yc.get = function (n) {
    var v = '',
        c = ' ' + document.cookie + ';',
        s = c.indexOf((' ' + n + '='));
    if (s >= 0) {
        s += n.length + 2;
        v = unescape(c.substring(s, c.indexOf(';', s)));
    }
    return v;
};
_yc.set = function (n,v,e) {
    document.cookie = n + "=" + escape(v) + 
        ";expires=" + (new Date(e * 1000)).toGMTString() + 
        ";path=/" + ";domain=www.yahoo.com";
};

If we do not need to support Netscape 3 or IE 4, then we can do that more elegantly
by using the object literal notation.

var _yc = {
    get: function (n) {
        var v = '',
            c = ' ' + document.cookie + ';',
            s = c.indexOf((' ' + n + '='));
        if (s >= 0) {
            s += n.length + 2;
            v = unescape(c.substring(s, c.indexOf(';', s)));
        }
        return v;
    },
    set: function (n,v,e) {
        document.cookie = n + "=" + escape(v) + 
            ";expires=" + (new Date(e * 1000)).toGMTString() + 
            ";path=/" + ";domain=www.yahoo.com";
    }
};

Use object augmentation.

At this point we have a couple of methods for manipulating cookies. It is surprising
then that the very next thing we find is code that does cookie manipulation
without taking advantage of the methods we just defined.

var b,
    l = '',
    n = '0',
    y;
y = ' ' + document.cookie + ';';
if ((b = y.indexOf(' Y=v')) >= 0) {
    y = y.substring(b, y.indexOf(';', b)) + '&';
    if ((b = y.indexOf('l=')) >= 0) {
        l = y.substring(b + 2, y.indexOf('&', b));
        if ((b = y.indexOf('n=')) >= 0)
            n = y.substring(b + 2, y.indexOf('&', b));
    }
}

It even replicates the same techniques that we saw earlier. It is likely that
both chunks of code were adapted from the same faulty original. We can improve
it by taking advantage of our recent work:

var l = '',
    n = '0',
    y = _yc.get('Y') + '&',
    b = y.indexOf('l=');
if (b >= 0) {
    l = y.substring(b + 2, y.indexOf('&', b));
    b = y.indexOf('n=');
    if (b >= 0) {
        n = y.substring(b + 2, y.indexOf('&', b));
    }
}

Code reuse is the Holy Grail of Software Engineering. We can imagine great
efficiencies obtained by avoiding the vast amount of hand work required by the
current state of the art. Here we found a failure to use a method that had been
defined adjacent to the place where it was needed.

Use common libraries.

The structure of software systems tend to reflect the structure of the organizations
that produce them. In this case, we see evidence of obvious inefficiencies caused
by an organization that lacks awareness of the interconnectedness of its own
processes. The application of style is critical, because it is only possible
to fit the pieces together properly if we can understand what the pieces are.

原文： The Elements of JavaScript Style

The Elements of JavaScript Style (Part Two)

Part Two: Idioms

Douglas Crockford

The Department of Style

2005-09-21

There are idioms that we can use to make our intentions clearer and more concise.

Consider this function:

function gw(f) {
    if (d.w.sv.checked == true) {
        zv = 'on';
    } else {
        zv = 'off';
    }
    procframe.location.replace("http://b.www.yahoo.com/module/wtr_tr.php?p=" + 
        escape(f.p.value) + "&sv=" + zv);
    return false;
}

The == operator should not be used for comparing values with true
because it does type coercion. If our intent is to determine if d.w.sv.checked
is the boolean value true, then we must use the ===
operator. If we only care that a value is truthy (and not falsy)
then it is better to not use an equality operator at all.

For example, because of type coercion., 1 == true is true, but1 === trueis false. The == operator can hide type errors.

Watch out for type coercion when using ==.

The if statement is being used to select one of two values. This
is what the ?: ternary operator is for.

zv = d.w.sv.checked ? 'on' : 'off';

Use the ?: operator to select one of two values.

The variable zv is not declared as a var or parameter of this
function, so it is an implicit global variable. If there is another function
on this page that uses a similarly named global variable, then a failure could
result. Bugs like this can be very difficult to find but are very easily avoided.
In this case, we can either declare that zv as a var, or we can
notice that it is used only once and get rid of it entirely.

function gw(f) {
    procframe.location.replace("http://b.www.yahoo.com/module/wtr_tr.php?p=" + 
        escape(f.p.value) + "&sv=" + d.w.sv.checked ? 'on' : 'off');
    return false;
}

Never use implicit global variables.

We would normally be suspicious of functions that return a constant, but this
is something that is sometimes required in a browser environment.

Next we see a case where the ?: operator is used improperly. It
is being used to select between two assignments.

function u(o, z) {
    var em = o.id.substring(1);
    var p = d.getElementById('e' + em);
    if (p) {
        (z == 0) ? p.style.backgroundColor = '#fff' : 
                   p.style.backgroundColor = '#989898';
    }
    p = d.getElementById('e' + (em - 1));
    if (p) {
        (z == 0) ? p.style.backgroundColor = '#fff' : 
                   p.style.backgroundColor = '#989898';
    }
}

The test of z is ambiguous. Do we select color #fff
if z is exactly 0, or if z is falsy?
As stated it appears to indicate the former, but it actually means the latter.
Fortunately in this case, we probably intend the latter, so it is not technically
an error (this time). But it is bad stylistically.

We can replace the ?: with an if, but it happens
that the assignments all use the same lvalue, so this time we can make
the correction without using an if.

function u(o, z) {
    var em = o.id.substring(1),
        p = d.getElementById('e' + em);
    if (p) {
        p.style.backgroundColor = z ? '#fff' : '#989898';
    }
    p = d.getElementById('e' + (em - 1));
    if (p) {
        p.style.backgroundColor = z ? '#fff' : '#989898';
    }
}

Do not use the ?: operator to select one of
two actions.

Event handling suffers from browser dependencies. Ideally, application programs
should be insulated from browser deficiencies by common libraries. When such
libraries are not available, functions like this happen:

function md(e) {
    (window.event) ? ev = window.event : ev = e;
    (ev.target) ? sr = ev.target : sr = ev.srcElement;
    if (ev && sr && sr.id == "fp" || sr.id == "sb") st = 1;
    if (sr.className.indexOf("pllist") < 0 && sr.className != "more" && 
            sr.className != "plinkc" && sr.tagName != "scrollbar " && 
            _toClose && _toCloseNorgie) {
        d.getElementById(_toClose).innerHTML = "";
        _toClose = "";
        _toCloseNorgie.parentNode.className = '';
        _toCloseNorgie = '';
    }
}

Some browsers pass an event object to event handlers as a parameter. Microsoft
chose instead to put the event object in a global event variable.
In JavaScript, global variables are members of the global object. In browsers,
the global object always contains a window member whose value is
the global object. Accessing global variables through window is
a way of avoiding undefined variable errors when testing for the existence of
a variable. However, it should never be necessary to make such a test.

Instead of first determining if this is a Microsoft event, we can instead ask
if it is the other kind.

ev = e || event;

We used the || (default) operator. If e
is truthy, we will use its value, but if e is falsy then we will
use event.

In the next statement, we can again use the || operator to determine
sr, the source element or target.

We should make ev and sr vars to avoid global conflict.

function md(e) {
    var ev = e || event,
        sr = ev.target || ev.srcElement;
    if (sr && (sr.id == 'fp' || sr.id == 'sb')) {
        st = 1;
    }
    if (sr.className.indexOf('pllist') < 0 && sr.className != 'more' && 
            sr.className != 'plinkc' && sr.tagName != 'scrollbar ' && 
            _toClose && _toCloseNorgie) {
        d.getElementById(_toClose).innerHTML = '';
        _toClose = '';
        _toCloseNorgie.parentNode.className = '';
        _toCloseNorgie = '';
    }
}

Use the || operator to specify a default value.

Next we find another event handler. As you would expect, it repeats some of
the same stylelessness as the previous one.

function kd(e) {
    (window.event) ? ev = window.event : ev = e;
    (ev.target) ? el = ev.target : el = ev.srcElement;
    if (ev && el) {
        code = ev.keyCode; 
        id = el.id;
    } else {
        return;
    }
    ctn = lt.id.substring(1);
    if (code == 13) {
        return;
    } else if ((code == 191 || code == 222) && id != 'fp') {
        _ffs = 1;
        gk = 0;
    } else if ((code < 31 || code > 41) && 
            (code < 16 || code > 18) && code != 9 && code != 8) {
        gk = 1;
    } else {
        gk = 0;
    }
    if (!_ffs && (id == 'fp' || id == 'st')) {
        if (code == 9) {
            if (box.value == '' || (box.value != '' && (at == 1 || ev.shiftKey))) {
                mt(ctn);
            } else if (id == 'st' && box.value != '' && at == 0) {
                at = 1;
                mt(ctn);
            }
        } else if (id == 'fp' && gk == 0 && 
                 (box.value == '' && st == 0) && !ev.shiftKey && !ev.ctrlKey && !ev.altKey) {
            d.getElementById('mk').focus();
            d.getElementById('mk').blur();
        } else if (gk == 1) {
            at = 0;
        }
    } else if ((id == 'mk2' && box.value != '' && ev.shiftKey && code == 9) || 
            (id == 'm6' && !ev.shiftKey && code == 9)){
        d.getElementById('mk').focus();
    } else if (!_ffs && gk == 1 && el.type != 'text' && !ev.ctrlKey && !ev.altKey){
        box.value = '';
        box.focus();
    }
}
function mt(ctn) {
    if ((ev && !ev.ctrlKey && !ev.altKey) || !ev) {
        if (ev.shiftKey){
            nextTab = parseInt(ctn) - 1;
        } else {
            nextTab = parseInt(ctn) + 1;
        }
        if (nextTab == 0) {
            d.getElementById('mk').focus();
        } else if (nextTab < 8) {
            t(d.getElementById('v' + nextTab));
        } else {
            return;
        }
    }
}

What is interesting is that it has a companion function, mt, which
is only called from kd. mt is passed one parameter
(ctn), but most of the communication between kd and
mt is through global variables.

Global variables are evil.

We could eliminate the use of global variables by increasing the number of
parameters sent to mt. But instead, we will make mt
an inner function of kd. As an inner function, mt
would have access to all of kd's vars.

function kd(e) {
    var ev = e || event,
        el = ev.target || ev.srcElement,
        cnt,
        code = ev.keyCode,
        gk,
        id = el.id,
        ctn = lt.id.substring(1);
    
    function mt() {
        var nextTab;
        if (!ev.ctrlKey && !ev.altKey) {
            nextTab = parseInt(ctn) + ev.shiftKey ? -1 : 1;
            if (!nextTab) {
                d.getElementById('mk').focus();
            } else if (nextTab < 8) {
                t(d.getElementById('v' + nextTab));
            }
        }
    }
    
    if (code == 13) {
        return;
    } else if ((code == 191 || code == 222) && id != 'fp') {
        _ffs = 1;
        gk = 0;
    } else if ((code < 31 || code > 41) && 
            (code < 16 || code > 18) && code != 9 && code != 8) {
        gk = 1;
    } else {
        gk = 0;
    }
    if (!_ffs && (id == 'fp' || id == 'st')) {
        if (code == 9) {
            if (box.value == '' || 
                    (box.value != '' && (at == 1 || ev.shiftKey))) {
                mt();
            } else if (id == 'st' && box.value != '' && at == 0) {
                at = 1;
                mt();
            }
        } else if (id == 'fp' && gk == 0 && (box.value == '' && st == 0) && 
                !ev.shiftKey && !ev.ctrlKey && !ev.altKey) {
            d.getElementById('mk').focus();
            d.getElementById('mk').blur();
        } else if (gk == 1) {
            at = 0;
        }
    } else if ((id == 'mk2' && box.value != '' && ev.shiftKey && code == 9) || 
            (id == 'm6' && !ev.shiftKey && code == 9)){
        d.getElementById('mk').focus();
    } else if (!_ffs && gk == 1 && el.type != 'text' && !ev.ctrlKey && 
            !ev.altKey) {
        box.value = '';
        box.focus();
    }
}

Function mt is called from two places in kd. By making
it an inner function, we were able to significantly reduce the number of global
variables that kd uses, which reduces its likelihood of interfering
with other components. kd is still a mess, but it is now a slightly
less disorderly mess.

Use inner functions to avoid global variables.

原文The Elements of JavaScript Style

Monday, September 17, 2007

Mysql Query Optimization

MySQL查询优化讲座之查询优化器原文

　　你提交一个查询的时候，MySQL会分析它，看是否可以做一些优化使处理该查询的速度更快。这一部分将介绍查询优化器是如何工作的。如果你想知道MySQL采用的优化手段，可以查看MySQL参考手册。

　　当然，MySQL查询优化器也利用了索引，但是它也使用了其它一些信息。例如，如果你提交如下所示的查询，那么无论数据表有多大，MySQL执行它的速度都会非常快：

SELECT * FROM tbl_name WHERE 0;

　　在这个例子中，MySQL查看WHERE子句，认识到没有符合查询条件的数据行，因此根本就不考虑搜索数据表。你可以通过提供一个EXPLAIN语句看到这种情况，这个语句让MySQL显示自己执行的但实际上没有真正地执行的SELECT查询的一些信息。如果要使用EXPLAIN，只需要在 EXPLAIN单词放在SELECT语句的前面：

mysql> EXPLAIN SELECT * FROM tbl_name WHERE 0\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: NULL
type: NULL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra: Impossible WHERE

　　通常情况下，EXPLAIN返回的信息比上面的信息要多一些，还包括用于扫描数据表的索引、使用的联结类型、每张数据表中估计需要检查的数据行数量等非空（NULL）信息。

　　优化器是如何工作的

　　MySQL查询优化器有几个目标，但是其中最主要的目标是尽可能地使用索引，并且使用最严格的索引来消除尽可能多的数据行。你的最终目标是提交 SELECT语句查找数据行，而不是排除数据行。优化器试图排除数据行的原因在于它排除数据行的速度越快，那么找到与条件匹配的数据行也就越快。如果能够首先进行最严格的测试，查询就可以执行地更快。假设你的查询检验了两个数据列，每个列上都有索引：

SELECT col3 FROM mytable
WHERE col1 = ’some value’ AND col2 = ’some other value’;

　　假设col1上的测试匹配了900个数据行，col2上的测试匹配了300个数据行，而同时进行的测试只得到了30个数据行。先测试Col1会有900个数据行，需要检查它们找到其中的30个与col2中的值匹配记录，其中就有870次是失败了。先测试col2会有300个数据行，需要检查它们找到其中的30个与col1中的值匹配的记录，只有270次是失败的，因此需要的计算和磁盘I/O 更少。其结果是，优化器会先测试col2，因为这样做开销更小。

　你可以通过下面一个指导帮助优化器更好地利用索引：

　　尽量比较数据类型相同的数据列。当你在比较操作中使用索引数据列的时候，请使用数据类型相同的列。相同的数据类型比不同类型的性能要高一些。例如，INT与BIGINT是不同的。CHAR(10)被认为是CHAR(10)或 VARCHAR(10)，但是与CHAR(12)或VARCHAR(12)不同。如果你所比较的数据列的类型不同，那么可以使用ALTER TABLE来修改其中一个，使它们的类型相匹配。

　　尽可能地让索引列在比较表达式中独立。如果你在函数调用或者更复杂的算术表达式条件中使用了某个数据列，MySQL就不会使用索引，因为它必须计算出每个数据行的表达式值。有时候这种情况无法避免，但是很多情况下你可以重新编写一个查询让索引列独立地出现。

　　下面的WHERE子句显示了这种情况。它们的功能相同，但是对于优化目标来说就有很大差异了：

WHERE mycol < 4 / 2
WHERE mycol * 2 < 4

　　对于第一行，优化器把表达式4/2简化为2，接着使用mycol上的索引来快速地查找小于2的值。对于第二个表达式，MySQL必须检索出每个数据行的 mycol值，乘以2，接着把结果与4进行比较。在这种情况下，不会使用索引。数据列中的每个值都必须被检索到，这样才能计算出比较表达式左边的值。

　　我们看另外一个例子。假设你对date_col列进行了索引。如果你提交一条如下所示的查询，就不会使用这个索引：

SELECT * FROM mytbl WHERE YEAR(date_col) < 1990;

　　这个表达式不会把1990与索引列进行比较；它会把1990与该数据列计算出来的值比较，而每个数据行都必须计算出这个值。其结果是，没有使用 date_col上的索引，因为执行这样的查询需要全表扫描。怎么解决这个问题呢？只需要使用文本日期，接着就可以使用date_col上的索引来查找列中匹配的值了：

WHERE date_col < ’1990-01-01’

　　但是，假设你没有特定的日期。你可能希望找到一些与今天相隔固定的几天的日期的记录。表达这种类型的比较有很多种方法--它们的效率并不同。下面就有三种：

WHERE TO_DAYS(date_col) - TO_DAYS(CURDATE()) < cutoff
WHERE TO_DAYS(date_col) < cutoff + TO_DAYS(CURDATE())
WHERE date_col < DATE_ADD(CURDATE(), INTERVAL cutoff DAY)

　　对于第一行，不会用到索引，因为每个数据行都必须检索以计算出TO_DAYS(date_col)的值。第二行要好一些。Cutoff和TO_DAYS (CURDATE())都是常量，因此在处理查询之前，比较表达式的右边可以被优化器一次性计算出来，而不需要每个数据行都计算一次。但是 date_col列仍然出现在函数调用中，它阻止了索引的使用。第三行是这几个中最好的。同样，在执行查询之前，比较表达式的右边可以作为常量一次性计算出来，但是现在它的值是一个日期。这个值可以直接与date_col值进行比较，再也不需要转换成天数了。在这种情况下，会使用索引。

　　在LIKE模式的开头不要使用通配符。有些字符串搜索使用如下所示的WHERE子句：

WHERE col_name LIKE ’%string%’

　　如果你希望找到那些出现在数据列的任何位置的字符串，这个语句就是对的。但是不要因为习惯而简单地把"%"放在字符串的两边。如果你在查找出现在数据列开头的字符串，就删掉前面的"%"。假设你要查找那些类似MacGregor或MacDougall等以"Mac"开头的名字。在这种情况下，WHERE 子句如下所示：

WHERE last_name LIKE ’Mac%’

　　优化器查看该模式中词首的文本，并使用索引找到那些与下面的表达式匹配的数据行。下面的表达式是使用last_name索引的另一种形式：

WHERE last_name >= ’Mac’ AND last_name < ’Mad’

　　这种优化不能应用于使用了REGEXP操作符的模式匹配。REGEXP表达式永远不会被优化。

帮助优化器更好的判断索引的效率。在默认情况下，当你把索引列的值与常量进行比较的时候，优化器会假设键值在索引内部是均匀分布的。在决定进行常量比较是否使用索引的时候，优化器会快速地检查索引，估计出会用到多少个实体（entry）。对应 MyISAM、InnoDB和BDB数据表来说，你可以使用ANALYZE TABLE让服务器执行对键值的分析。它会为优化器提供更好的信息。

　　使用EXPLAIN验证优化器的操作。EXPLAIN语句可以告诉你是否使用了索引。当你试图用另外的方式编写语句或检查添加索引是否会提高查询执行效率的时候，这些信息对你是有帮助的。

　　在必要的时候给优化器一些提示。正常情况下，MySQL优化器自由地决定扫描数据表的次序来最快地检索数据行。在有些场合中优化器没有作出最佳选择。如果你察觉这种现象发生了，就可以使用STRAIGHT_JOIN关键字来重载优化器的选择。带有STRAIGHT_JOIN的联结类似于交叉联结，但是强迫数据表按照FROM子句中指定的次序来联结。

　　在SELECT语句中有两个地方可以指定STRAIGHT_JOIN。你可以在SELECT关键字和选择列表之间的位置指定，这样会对语句中所有的交叉联结产生影响；你也可以在FROM子句中指定。下面的两个语句功能相同：

SELECT STRAIGHT_JOIN ... FROM t1, t2, t3 ... ;
SELECT ... FROM t1 STRAIGHT_JOIN t2 STRAIGHT_JOIN t3 ... ;

　　分别在带有STRAIGHT_JOIN和不带STRAIGHT_JOIN的情况下运行这个查询；MySQL可能因为什么原因没有按照你认为最好的次序使用索引（你可以使用EXPLAIN来检查MySQL处理每个语句的执行计划）。

　　你还可以使用FORCE INDEX、USE INDEX或IGNORE INDEX来指导服务器如何使用索引。

　　利用优化器更加完善的区域。MySQL可以执行联结和子查询，但是子查询是最近才支持的，是在MySQL 4.1中添加的。因而在很多情况下，优化器对联结操作的调整比对子查询的调整要好一些。当你的子查询执行地很慢的时候，这就是一条实际的提示。有一些子查询可以使用逻辑上相等的联结来重新表达。在可行的情况下，你可以把子查询重新改写为联结，看是否执行地快一些。

　　测试查询的备用形式，多次运行。当你测试查询的备用形式的时候（例如，子查询与等同的联结操作对比），每种方式都应该多次运行。如果两种形式都只运行了一次，那么你通常会发现第二个查询比第一个快，这是因为第一个查询得到的信息仍然保留在缓存中，以至于第二个查询没有真正地从磁盘上读取数据。你还应该在系统负载相对平稳的时候运行查询，以避免系统中其它的事务影响结果。

　　避免过度地使用MySQL自动类型转换。MySQL会执行自动的类型转换，但是如果你能够避免这种转换操作，你得到的性能就更好了。例如，如果num_col是整型数据列，那么下面这些查询将返回相同的结果：

SELECT * FROM mytbl WHERE num_col = 4;
SELECT * FROM mytbl WHERE num_col = ’4’;

　　但是第二个查询涉及到了类型转换。转换操作本身为了把整型和字符串型转换为双精度型进行比较，使性能恶化了。更严重的情况是，如果num_col是索引的，那么涉及到类型转换的比较操作不会使用索引。

　　相反类型的比较操作（把字符串列与数值比较）也会阻止索引的使用。假设你编写了如下所示的查询：

SELECT * FROM mytbl WHERE str_col = 4;

　　在这个例子中，不会使用str_col上的索引，因为在把str_col中的字符串值转换成数值的时候，可能有很多值等于4（例如’4’、’4.0’和’4th’）。分辨哪些值符合要求的唯一办法是读取每个数据行并执行比较操作。

使用EXPLAIN来检查优化器的操作

　　EXPLAIN对于了解优化器生成的、用于处理语句的执行计划的内部信息是很有帮助的。在这一部分中，我们将解释EXPLAIN的两种用途：

　　· 查看采用不同的方式编写的查询是否影响了索引的使用。

　　· 查看向数据表添加索引对优化器生成高效率执行计划的能力的影响。

　　这一部分只讨论与示例相关的EXPLAIN输入字段。

　　前面，在"优化器是如何工作的"部分中我们得出的观点是，你编写表达式的方式将决定优化器是否能使用可用的索引。特别是上面的讨论使用了下面三个逻辑相等的WHERE子句的例子，只有第三个允许使用索引：

WHERE TO_DAYS(date_col) - TO_DAYS(CURDATE()) < cutoff
WHERE TO_DAYS(date_col) < cutoff + TO_DAYS(CURDATE())
WHERE date_col < DATE_ADD(CURDATE(), INTERVAL cutoff DAY)

　　EXPLAIN允许你查看编写表达式的某种方式是否比另外的方式好一些。为了看到结果，让我们分别用这三个WHERE子句搜索成员表中过期的数据列值，把cutoff值设为30天。为了看到索引的使用和表达式编写方式之间的关系，我们首先对expiration列进行索引：

mysql> ALTER TABLE member ADD INDEX (expiration);

　　在每个表达式形式上使用EXPLAIN，看优化器生成了什么样的执行计划：

mysql> EXPLAIN SELECT * FROM MEMBER
-> WHERE TO_DAYS(expiration) - TO_DAYS(CURDATE()) < 30\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: MEMBER
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 102
Extra: Using where
mysql> EXPLAIN SELECT * FROM MEMBER
-> WHERE TO_DAYS(expiration) < 30 + TO_DAYS(CURDATE())\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: MEMBER
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 102
Extra: Using where
mysql> EXPLAIN SELECT * FROM MEMBER
-> WHERE expiration < DATE_ADD(CURDATE(), INTERVAL 30 DAY)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: MEMBER
type: range
possible_keys: expiration
key: expiration
key_len: 4
ref: NULL
rows: 6
Extra: Using where

　　上面的结果显示，前面两个语句没有使用索引。类型（type）值表明了将如何从数据表中读取信息。ALL意味着"将检查所有的记录"。也就是说，它会执行全表扫描，没有利用索引。每个与键相关的列都是NULL也表明没有使用索引。

　　与此形成对比的是，第三个语句的结果显示，采用这种方式编写的WHERE子句，优化器可以使用expiration列上的索引：

　　· 类型（type）值表明它可以使用索引来搜索特定范围的值（小于右边表达式给定的值）。

　　· 可能键（possible_keys）和键（key）值显示expiration上的索引已经被考虑作为备选索引，并且它也是真正使用的索引。

　　· 行数（rows）值显示优化器估计自己需要检查6个数据行来处理该查询。这比前面两个执行计划的102小很多。

　　EXPLAIN的第二种用途是查看添加索引是否能帮助优化器更高效率地执行语句。我将使用两个未被索引的数据表。它足够显示建立索引的效率。相同的规则可以应用于涉及多表的更加复杂的联结操作。

　　假设我们有两个数据表t1和t2，每个有1000行，包含的值从1到1000。下面的查询查找出两个表中值相同的数据行：

mysql> SELECT t1.i1, t2.i2 FROM t1, t2 WHERE t1.i1 = t2.i2;
+------+------+
| i1 | i2 |
+------+------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
...

　　两个表都没有索引的时候，EXPLAIN产生下面的结果：

mysql> EXPLAIN SELECT t1.i1, t2.i2 FROM t1, t2 WHERE t1.i1 = t2.i2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1000
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t2
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1000
Extra: Using where

　　类型列中的ALL表明要进行检查所有数据行的全表扫描。可能键列中的NULL表明没有找到用于提高查询速度的备选索引（键、键长度和参考列都是NULL也是因为缺少合适的索引）。Using where表明使用WHERE子句中的信息来识别合格的数据行。

　　这段信息告诉我们，优化器没有为提高执行查询的效率找到任何有用的信息：

　　· 它将对t1表进行全表扫描。

　　· 对于t1中的每一行，它将执行t2的全表扫描，使用WHERE子句中的信息识别出合格的行。

　　行数值显示了优化器估计的每个阶段查询需要检查的行数。T1的估计值是1000，因为1000可以完成全表扫描。相似地，t2的估计值也是1000，但是这个值是对于t1的每一行的。换句话说，优化器所估计的处理该查询所需要检查的数据行组合的数量是1000×1000，也就是一百万。这会造成很大的浪费，因为实际上只有1000个组合符合WHERE子句的条件。
为了使这个查询的效率更高，给其中一个联结列添加索引并重新执行EXPLAIN语句：

mysql> ALTER TABLE t2 ADD INDEX (i2);
mysql> EXPLAIN SELECT t1.i1, t2.i2 FROM t1, t2 WHERE t1.i1 = t2.i2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1000
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t2
type: ref
possible_keys: i2
key: i2
key_len: 5
ref: sampdb.t1.i1
rows: 10
Extra: Using where; Using index

　　我们可以看到性能提高了。T1的输出没有改变（表明还是需要进行全表扫描），但是优化器处理t2的方式就有所不同了：

　　· 类型从ALL改变为ref，意味着可以使用参考值（来自t1的值）来执行索引查找，定位t2中合格的数据行。

　　· 参考值在参考（ref）字段中给出了：sampdb.t1.i1。

　　· 行数值从1000降低到了10，显示出优化器相信对于t1中的每一行，它只需要检查t2中的10行（这是一个悲观的估计值。实际上，在t2中只有一行与 t1中数据行匹配。我们在后面会看到如何帮助优化器改善这个估计值）。数据行组合的全部估计值使1000×10＝10000。它比前面的没有索引的时候估计出来的一百万好多了。

　　对t1进行索引有价值吗？实际上，对于这个特定的联结操作，扫描一张表是必要的，因此没有必要对t1建立索引。如果你想看到效果，可以索引t1.i1并再次运行EXPLAIN：

mysql> ALTER TABLE t1 ADD INDEX (i1);
mysql> EXPLAIN SELECT t1.i1, t2.i2 FROM t1, t2 WHERE t1.i1 = t2.i2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: index
possible_keys: i1
key: i1
key_len: 5
ref: NULL
rows: 1000
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t2
type: ref
possible_keys: i2
key: i2
key_len: 5
ref: sampdb.t1.i1
rows: 10
Extra: Using where; Using index

　　上面的输出与前面的EXPLAIN的输出相似，但是添加索引对t1的输出有一些改变。类型从NULL改成了index，附加（Extra）从空的改成了 Using index。这些改变表明，尽管对索引的值仍然需要执行全表扫描，但是优化器还是可以直接从索引文件中读取值，根据不需要使用数据文件。你可以从 MyISAM表中看到这类结果，在这种情况下，优化器知道自己只询问索引文件就能够得到所有需要的信息。对于InnoDB 和BDB表也有这样的结果，在这种情况下优化器可以单独使用索引中的信息而不用搜索数据行。

　　我们可以运行ANALYZE TABLE使优化器进一步优化估计值。这会引起服务器生成键值的静态分布。分析上面的表并再次运行EXPLAIN得到了更好的估计值：

mysql> ANALYZE TABLE t1, t2;
mysql> EXPLAIN SELECT t1.i1, t2.i2 FROM t1, t2 WHERE t1.i1 = t2.i2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: index
possible_keys: i1
key: i1
key_len: 5
ref: NULL
rows: 1000
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t2
type: ref
possible_keys: i2
key: i2
key_len: 5
ref: sampdb.t1.i1
rows: 1
Extra: Using where; Using index

　　在这种情况下，优化器估计在t2中与t1的每个值匹配的数据行只有一个。
重载优化过程

　　这个过程听起来多余，但是有时候你还是希望去掉某些MySQL优化行为的：

　　重载优化器的表联结次序。使用STRAIGHT_JOIN强迫优化器按照特定的次序使用数据表。在这样操作的时候，你必须对数据表进行排序，这样才能保证第一张表是被选择的行数最少的表。如果你不能确定被选择行数最少的是哪一张表，那么就把行数最多的放到第一的位置。换句话说，试着对表进行排序，使最有约束力的选择出现在最前面。你对可能的备选数据行缩小地越早，执行查询的性能就越好。请确保在带有STRAIGHT_JOIN和不带 STRAIGHT_JOIN的时候分别执行该查询。有时候由于某些原因的存在，优化器没有按照你认定的方式联结数据表，STRAIGHT_JOIN也可能没有实际的帮助作用。

　　另一个可能性是在联结的数据表列表中的某个表的后面使用FORCE INDEX、USE INDEX和IGNORE INDEX调节符来告诉MySQL如何使用索引。这在优化器没有做出正确选择的时候是有用处的。

　　以最小的代价清空一张表。当需要完全地清空一张MyISAM数据表的时候，最快的方法是删除它并利用它的.frm文件中存储的脚本来重新建立它。使用TRUNCATE TABLE语句实现：

TRUNCATE TABLE tbl_name;

　　通过重新建立MyISAM数据表来清空它的这种服务器优化措施使该操作非常快，因为不需要单独地逐行删除。

　　但是TRUNCATE TABLE也带来了一些副作用，在某些环境中是不符合要求的：

　　· TRUNCATE TABLE不一定能够计算出被删除的数据列的精确数量。如果你需要这个数值，请使用不带WHERE子句的DELETE语句：

DELETE FROM tbl_name;

　　· 但是，通过重新建立来清空数据表，它可能会把序号的起始值设置为1。为了避免这种情况，请使用"不优化的"全表DELETE语句，它带有一个恒为真的WHERE子句：

DELETE FROM tbl_name WHERE 1;

　　添加WHERE子句会强迫MySQL进行逐行删除，因为它必须计算出每一行的值来判断是否能够删除它。这个语句执行的速度很慢，但是它却保留了当前的AUTO_INCREMENT序号。

Optimizing your MySQL Application

By Mike Sullivan

May 21st 2001

Reader Rating: 9

So you’ve finished reading Kevin Yank’s article Building a Database-Driven Web Site Using PHP and MySQL [1], and you’re happily databasing your site, when it starts to slow down. You need to get your site zipping along again before your host threatens to kick you off for almost killing their server. How do you do this? Enter: MySQL’s internal turbo-charger, indexes.

Disclaimer

I've attempted to keep my queries as simple as possible, but I assume that you have a basic understanding of databases and the SQL language (more specifically, MySQL’s implementation of it). I also assume you have MySQL 3.23, as a few of these queries may not work on 3.22. If you don’t have MySQL 3.23 yet, I highly recommend you install it if possible, as some of the performance increases are significant.

What are Indexes?

Indexes are organized versions of specific columns in your tables. MySQL uses indexes to facilitate quick retrieval of records. With indexes, MySQL can jump directly to the records you want. Without any indexes, MySQL has to read the entire data file to find the correct record(s). Here’s an example.

Suppose we created a table called "people":

CREATE TABLE people ( peopleid SMALLINT NOT NULL, name CHAR(50) NOT NULL );

Then we insert 1000 different names into the table in a completely random, non-alphabetic order. A small portion of the data file may be represented like this:

402table1

As you can see, there’s no recognizable order to the “name” column whatsoever. If we create an index on the “name” column, MySQL will automatically order this index alphabetically:

402table2

For each entry in the index, MySQL also internally maintains a “pointer” to the correct row in the actual data file. So if I want to get the value of peopleid when the name is Mike (SELECT peopleid FROM people WHERE name='Mike';), MySQL can look in the name index for Mike, jump directly to the correct row in the data file, and return the correct value of peopleid (999). MySQL only has to look at one row to get the result. Without an index on “name”, MySQL would’ve scanned all 1000 rows in the data file! In general, the less rows MySQL has to evaluate, the quicker it can do its job.

Types of Indexes

There are several types of indexes to choose from in MySQL:

[Note: Full query lists and examples can be found at the end of this article.]

"Normal" Indexes – "Normal" indexes are the most basic indexes, and have no restraints such as uniqueness. These can be added by creating an index (CREATE INDEX name_of_index ON tablename (columns_to_index);), altering the table (ALTER TABLE tablename ADD INDEX [name_of_index] (columns_to_index);), or when creating the table (CREATE TABLE tablename ( [...], INDEX [name_of_index] (columns_to_index) );).

Unique Indexes – Unique indexes are the same as "Normal" indexes with one difference: all values of the indexed column(s) must only occur once. These can be added by creating an index (CREATE UNIQUE INDEX name_of_index ON tablename (columns_to_index);), altering the table (ALTER TABLE tablename ADD UNIQUE [name_of_index] (columns_to_index);) or when creating the table (CREATE TABLE tablename ( [...], UNIQUE [name_of_index] (columns_to_index) );).

Primary keys – Primary keys are unique indexes that must be named “PRIMARY”. If you have used AUTO_INCREMENT columns, you’re probably familiar with these. These indexes are almost always added when creating the table (CREATE TABLE tablename ( [...], PRIMARY KEY (columns_to_index) );), but may also be added by altering the table (ALTER TABLE tablename ADD PRIMARY KEY (columns_to_index);). Note that you may only have one primary key per table.

Full-text indexes – Full-text indexes are used by MySQL in full-text searches. Because full-text search is so new and would add unnecessary complexity to this article, I won't explain it here. Should you want more information, visit the MySQL documentation [2].

Single- vs. Multi-column Indexes

You’ve probably noticed that in the CREATE INDEX, ALTER TABLE and CREATE TABLE queries above, I made references to columns (plural). Again, this can be best explained by an example.

Here’s a more complex version of the people table:

CREATE TABLE people ( peopleid SMALLINT NOT NULL AUTO_INCREMENT, firstname CHAR(50) NOT NULL, lastname CHAR(50) NOT NULL, age SMALLINT NOT NULL, townid SMALLINT NOT NULL, PRIMARY KEY (peopleid) );

(Note that because “peopleid” is an AUTO_INCREMENT field, it must be declared the primary key)

A small snippet of the data we insert may look like this (ignore townid for now):

402table3

From this snippet, we have four Mikes (two Sullivans, two McConnells), two 17 year olds, and an unrelated odd ball (Joe Smith).

My intended use for this table is to get the peopleid for users with a specific first name, last name, and age. For example, I want to find the peopleid for Mike Sullivan, aged 17 (SELECT peopleid FROM people WHERE firstname='Mike' AND lastname='Sullivan' AND age=17;). Since I don’t want to have MySQL do a full table scan, I need to look into some indexing.

My first option is to create an index on a single column, firstname, lastname, or age. If I put the index on firstname (ALTER TABLE people ADD INDEX firstname (firstname);), MySQL will use the index to limit the records to those where firstname=’Mike’. Using this “temporary result set,” MySQL will apply each additional condition individually. First it eliminates those whose last name isn’t Sullivan. Then it eliminates those who aren’t 17. MySQL has now applied all conditions and can return the results.

This is more efficient than forcing MySQL to do a full table scan, but we’re still forcing MySQL to scan significantly more rows than it needs to. We could drop the index on firstname and add an index on lastname or age, but the results would be very similar.

Here's where multi-column indexes come into play. If we add a single index on three columns, we can get the correct set in a single pass! Here is the code I use to add this index:

ALTER TABLE people ADD INDEX fname_lname_age (firstname,lastname,age);

Since the index file is organized, MySQL can jump directly to the correct first name, then move to the correct last name, and finally go directly to the correct age. MySQL has found the correct rows without having to scan a single row of the data file!

Now, you’re probably wondering if creating three single-column indexes on (firstname), (lastname), and (age) is the same as one multi-column index on (firstname,lastname,age). No: it's completely different. When running a query, MySQL can only use one index. If you have three single-column indexes, MySQL will attempt to pick the most restrictive one, but the most restrictive single-column index will be significantly less restrictive than our multi-column index on (firstname,lastname,age).

Leftmost Prefixing

Multi-column indexes provide an additional benefit through what is known as leftmost prefixing. To continue our previous example, we have a three-column index on (firstname,lastname,age), which I have nicknamed “fname_lname_age” (I’ll explain more about that later). This index will be used when searching the following combination of columns:

firstname,lastname,age

firstname,lastname

firstname

To put it another way, we have basically created indexes on (firstname,lastname,age), (firstname,lastname), and just (firstname). The following queries can use the index:

SELECT peopleid FROM people WHERE firstname=’Mike’ AND lastname=’Sullivan’ AND age=’17’; SELECT peopleid FROM people WHERE firstname=’Mike’ AND lastname=’Sullivan’; SELECT peopleid FROM people WHERE firstname=’Mike’;

The following queries cannot use the index at all:

SELECT peopleid FROM people WHERE lastname=’Sullivan’; SELECT peopleid FROM people WHERE age=’17’; SELECT peopleid FROM people WHERE lastname=’Sullivan’ AND age=’17’;

How to Pick Columns to Index

One of the most important steps in optimizing is selecting which columns to index. There are two major places you want to consider indexing: columns you reference in the WHERE clause and columns used in join clauses. Look at the following query:

SELECT age ## no use indexing FROM people WHERE firstname='Mike' ## consider indexing AND lastname='Sullivan' ## consider indexing

This query is a little different from the past ones, but it’s still quite simple. Since “age” is referenced in the SELECT portion, MySQL will not use it to limit the chosen rows. Hence, there is no great need to index it. Here’s a more complex example:

SELECT people.age, ## no use indexing town.name ## no use indexing FROM people LEFT JOIN town ON people.townid=town.townid ## consider indexing ## town.townid WHERE firstname='Mike' ## consider indexing AND lastname='Sullivan' ## consider indexing

The possibility of indexing firstname and lastname carries over as they are again located in the WHERE clause. An additional field you'll want to consider indexing is the townid field from town table (please note that I’m only using the town table as an example of a join) because it is in a join clause.

“So I simply consider indexing every field in the WHERE clause or a join clause?” Almost, but not quite. Next, you need to consider the type of comparisons your doing on the fields. MySQL will only use indexes for '<', '<=', '=', '>', '>=', BETWEEN, IN, and some LIKE operations. These specific LIKE operations are times where the first character is not a wildcard (% or _). SELECT peopleid FROM people WHERE firstname LIKE 'Mich%'; would use an index, but SELECT peopleid FROM people WHERE firstname LIKE '%ike'; wouldn’t.

Analyzing Index Efficiency

You have some ideas on which indexes to use, but you’re not sure which is the most efficient. Well, you’re in luck, because MySQL has a built-in SQL statement to do this, known as EXPLAIN. The general syntax for this is EXPLAIN select statement;. You can find more information in the MySQL documentation [3]. Here’s an example:

EXPLAIN SELECT peopleid FROM people WHERE firstname='Mike' AND lastname='Sullivan' AND age='17';

This will return a somewhat cryptic result that will look usually look similar to this:

[Note: table split across two rows for readability]
+--------+------+-----------------+-----------------+ | table | type | possible_keys | key | +--------+------+-----------------+-----------------+ ... | people | ref | fname_lname_age | fname_lname_age | +--------+------+-----------------+-----------------+ +---------+-------------------+------+------------+ | key_len | ref | rows | Extra | ... +---------+-------------------+------+------------+ | 102 | const,const,const | 1 | Where used | +---------+-------------------+------+------------+

Let's break this down column by column.

table - This is the name of the table. This will become important when you have large joins, as each table will get a row.
type - The type of the join. Here's what the MySQL documentation has to say about the ref type:
All rows with matching index values will be read from this table for each combination of rows from the previous tables. ref is used if the join uses only a leftmost prefix of the key, or if the key is not UNIQUE or a PRIMARY KEY (in other words, if the join cannot select a single row based on the key value). If the key that is used matches only a few rows, this join type is good.
In this case, since our index isn’t UNIQUE, this is the best join type we can get.
In summary, if the join type is listed as “ALL” and you aren’t trying to select most of the rows in the table, then MySQL is doing a full table scan which is usually very bad. You can fix this by adding more indexes. If you want more information, the MySQL manual covers this value with much more depth.
possible_keys - The name of the indexes that could possibly be used. This is where nicknaming your index helps. If you leave the name field blank, the name defaults to the name of the first column in the index (in this case, it would be “firstname”), which isn’t very descriptive.
key - This shows the name of the index that MySQL actually uses. If this is empty (or NULL), then MySQL isn’t using an index.
key_len - The length, in bytes, of the parts of the index being used. In this case, it’s 102 because firstname takes 50 bytes, lastname takes 50, and age takes 2. If MySQL were only using the firstname part of the index, this would be 50.
ref - This shows the name of the columns (or the word “const”) that MySQL will use to select the rows. Here, MySQL references three constants to find the rows.
rows - The number of rows MySQL thinks it has to go through before knowing it has the correct rows. Obviously, one is the best you can get.
Extra - There are many different options here, most of which will have an adverse effect on the query. In this case, MySQL is simply reminding us that it used the WHERE clause to limit the results.

Disadvantages of Indexing

So far, I’ve only discussed why indexes are great. However, they do have several disadvantages.

First, they take up disk space. Usually this isn’t significant, but if you decided to index every column in every possible combination, your index file would grow much more quickly than the data file. If you have a large table, the index file could reach your operating system’s maximum file size.

Second, they slow down the speed of writing queries, such as DELETE, UPDATE, and INSERT. This is because not only does MySQL have to write to the data file, it has to write everything to the index file as well. However, you may be able to write your queries in such a way that the performance degradation is not very noticeable.

Conclusion

Indexes are one of the keys to speed in large databases. No matter how simple your table, a 500,000-row table scan will never be fast. If you have a site with a 500,000-row table, you should really spend time analyzing possible indexes and possibly consider rewriting queries to optimize your application.

As always, there is more to indexing than I covered in this article. More information can be found in the official MySQL manual [4], or in Paul DuBois’ great book, MySQL [5].

Query Reference

Adding a “normal” index via CREATE INDEX:
CREATE INDEX [index_name] ON tablename (index_columns); Example: CREATE INDEX fname_lname_age ON people (firstname,lastname,age);

Adding a unique index via CREATE INDEX:
CREATE UNIQUE INDEX [index_name] ON tablename (index_columns); Example: CREATE UNIQUE INDEX fname_lname_age ON people (firstname,lastname,age);

Adding a “normal” index via ALTER TABLE:
ALTER TABLE tablename ADD INDEX [index_name] (index_columns); Example: ALTER TABLE people ADD INDEX fname_lname_age (firstname,lastname,age);

Adding a unique index via ALTER TABLE:
ALTER TABLE tablename ADD UNIQUE [index_name] (index_columns); Example: ALTER TABLE people ADD UNIQUE fname_lname_age (firstname,lastname,age);

Adding a primary key via ALTER TABLE:
ALTER TABLE tablename ADD PRIMARY KEY (index_columns); Example: ALTER TABLE people ADD PRIMARY KEY (peopleid);

Adding a “normal” index via CREATE TABLE:
CREATE TABLE tablename ( rest of columns, INDEX [index_name] (index_columns) [other indexes] ); Example:
CREATE TABLE people ( peopleid SMALLINT UNSIGNED NOT NULL, firstname CHAR(50) NOT NULL, lastname CHAR(50) NOT NULL, age SMALLINT NOT NULL, townid SMALLINT NOT NULL, INDEX fname_lname_age (firstname,lastname,age) );

Adding a unique index via CREATE TABLE:
CREATE TABLE tablename ( rest of columns, UNIQUE [index_name] (index_columns) [other indexes] ); Example:
CREATE TABLE people ( peopleid SMALLINT UNSIGNED NOT NULL, firstname CHAR(50) NOT NULL, lastname CHAR(50) NOT NULL, age SMALLINT NOT NULL, townid SMALLINT NOT NULL, UNIQUE fname_lname_age (firstname,lastname,age) );

Adding a primary key via CREATE TABLE:
CREATE TABLE tablename ( rest of columns, INDEX [index_name] (index_columns) [other indexes] ); Example:
CREATE TABLE people ( peopleid SMALLINT NOT NULL AUTO_INCREMENT, firstname CHAR(50) NOT NULL, lastname CHAR(50) NOT NULL, age SMALLINT NOT NULL, townid SMALLINT NOT NULL, PRIMARY KEY (peopleid) );

Dropping (removing) a “normal” or unique index via ALTER TABLE:
ALTER TABLE tablename DROP INDEX index_name; Example: ALTER TABLE people DROP INDEX fname_lname_age;

Dropping (removing) a primary key via ALTER TABLE:
ALTER TABLE tablename DROP PRIMARY KEY; Example: ALTER TABLE people DROP PRIMARY KEY;

[1] http://www.webmasterbase.com/article.php/228
[2] http://www.mysql.com/doc/F/u/Fulltext_Search.html
[3] http://www.mysql.com/doc/E/X/EXPLAIN.html
[4] http://www.mysql.com/doc/
[5] http://www.webmasterbase.com/article.php/225

source article

Subscribe to: Posts ( Atom )