Subject: bug#1654: 23.0.60; auto encoding detection
(detect-coding-region) not working
In article <ukwsabwo8x.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>, poppyer
> But for the big5, in the list returned by
> "(detect_coding_region (region-beginning) (region-end))",
> there is not big5. I do understand that gbk and big5's sequences might
> not be easy to distinguish, but in this case, both encodings are
> compatible to the input literal text, so both should be in the returned list.
> I right?
You are right. But, the current Emacs can't have both GBK
and Big5 in a list of coding systems to try for detecting
because they are in the same category of coding-system
(i.e. charset-base). I know that this restriction is not
good, and improving it is in my todo list, but I still don't
have a time to work on it.
> BTW, is that any hook that I can put after the coding detection? I might
> want to write a small lisp to distinguish BIG5 and GBK (by char statistics,
> for example).
We don't have such a hook, but I think you can use
after-insert-file-functions for reading a file. When that
hook is called, the buffer already contains a text decoded
by buffer-file-coding-system. You can re-decode the newly
inserted text as this:
(defun check-gbk-big5 (nchars)
(if (and enable-multibyte-characters
'chinese-gbk (coding-system-base buffer-file-coding-system)))
(let* ((pos (point))
(end (+ pos nchars))
(when (search-forward "\x5201" end t) ;; (*1)
(narrow-to-region pos end)
(encode-coding-region pos end buffer-file-coding-system)
gion pos (point-max) 'big5)
(setq nchars (point-max))))))
(add-hook 'after-insert-file-functions 'check-gbk-big5)
You can change (*1) part to your check function.