Subject: bug#1654: 23.0.60; auto encoding detection
(detect-coding-region) not working



In article <ukwsabwo8x.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>, poppyer
<poppyer@xxxxxxxxx> writes:

> But for the big5, in the list returned by
> "(detect_coding_region (region-beginning) (region-end))",
> there is not big5. I do understand that gbk and big5's sequences might
> not be easy to distinguish, but in this case, both encodings are
> compatible to the input literal text, so both should be in the returned list.
> Am
> I right?

You are right. But, the current Emacs can't have both GBK
and Big5 in a list of coding systems to try for detecting
because they are in the same category of coding-system
(i.e. charset-base). I know that this restriction is not
good, and improving it is in my todo list, but I still don't
have a time to work on it.

> BTW, is that any hook that I can put after the coding detection? I might
> want to write a small lisp to distinguish BIG5 and GBK (by char statistics,
> for example).

We don't have such a hook, but I think you can use
after-insert-file-functions for reading a file. When that
hook is called, the buffer already contains a text decoded
by buffer-file-coding-system. You can re-decode the newly
inserted text as this:

(defun check-gbk-big5 (nchars)
(if (and enable-multibyte-characters
(not coding-system-for-read)
(coding-system-equal
'chinese-gbk (coding-system-base buffer-file-coding-system)))
(let* ((pos (point))
(end (+ pos nchars))
(modified (buffer-modified-p)))
(when (search-forward "\x5201" end t) ;; (*1)
(save-restriction
(goto-char pos)
(narrow-to-region pos end)
(encode-coding-region pos end buffer-file-coding-system)
(decode-coding-re...

gion pos (point-max) 'big5)
(set-buffer-file-coding-system last-coding-system-used)
(set-buffer-modified-p modified)
(setq nchars (point-max))))))
nchars)

(add-hook 'after-insert-file-functions 'check-gbk-big5)

You can change (*1) part to your check function.

---
Kenichi Handa
handa@xxxxxxxx






Privacy