【1-1-16-2-2 Unicode 汉字与GB内码的转换】萍聚社区-德国热线-德国实用信息网INFO天空

文选流氓 发表于 2003-5-11 22:31

1-1-16-2-2 Unicode 汉字与GB内码的转换

发信人: intranetworm (小虫), 信区: Java
标题: Unicode 汉字与GB内码的转换
发信站: BBS 水木清华站 (Wed Aug 27 13:44:45 1997)

这是我编的一个转换程序,使用时现将前面的码表存成文件,例如table.txt
创建一个GBUnicode实例,new GBUnicode("table.txt")
以后调用GB2Uni和Uni2GB作内码转换.注意GB内码是用两个字节表示的.

import java.io.*;
import java.util.Hashtable;

class GBUnicode{
   byte high[]=new byte,low[]=new byte;
   char unichar[]=new char;
   Hashtable UniGB;

   public GBUnicode(String table_file)throws IOException
   {
            //BufferedInputStream tables=new BufferedInputStream (new FileIn
            DataInputStream tables=new DataInputStream (new FileInputStream(
            int i,n=0;
            byte b,bl,bh,num[]=new byte;

            UniGB=new Hashtable(7000,1);
            while (n<6763 ){
                     do{
                           bh=(byte)tables.read();
                     }while ((char)bh<=' '); //find first non-blank char
                     bl=(byte)tables.read();
                     high=bh;
                     low=bl;
                     do{
                           b=(byte)tables.read();
                     }while (b!=(byte)':'); //find ':'
                     do{
                           b=(byte)tables.read();
                     }while ((char)b<=' '); //find next non-blank char to rea
                     i=0;
                     while ((char)b>='0' && (char)b<='9'){
                           num=b;
                           b=(byte)tables.read();
                     }
                     unichar=(char)Integer.parseInt(new String(num,0,0,i))
                     if (UniGB.get(new Character(unichar))!= null)
                           System.out.println("Duplicated : "+unichar);
                     UniGB.put(new Character(unichar),new Integer(n));
                     n=n+1;
            }
            tables.close();
   }

   private int getGBindex(byte high,byte low){
            int i,j;
            i=high-(byte)0xb0;
            j=low-(byte)0xa1;
            if (i <39) {// L1 Chinese
                     if (j<0 || j>94)
                           return -1;
                     return (i*94+j);
            }
            else if (i==39) {//one of the last 89 L1 Chinese
                     if (j<0 || j>89)
                           return -1;
                     return (i*94+j);
            }
            else {//L2 Chinese
                     if (j<0 || j>94)
                           return -1;
                     return (i*94+j-5);
            }
   }

   public byte[] Uni2GB(char unicode) {

            Integer index=(Integer)UniGB.get(new Character(unicode));
            if (index==null)
                     return null;
            byte ch[]=new byte;
            ch=high;
            ch=low;
            return ch;

   public byte[] Uni2GB(char unicode) {

            Integer index=(Integer)UniGB.get(new Character(unicode));
            if (index==null)
                     return null;
            byte ch[]=new byte;
            ch=high;
            ch=low;
            return ch;
   }

   public char GB2Uni(byte high, byte low) {
            int index=getGBindex(high,low);
            if (index ==-1) //not GB Chinese
                     return 0;
            return(unichar);
   }
}

--
※ 来源:·BBS 水木清华站 bbs.net.tsinghua.edu.cn·

页: [1]

萍聚社区-德国热线-德国实用信息网's Archiver

1-1-16-2-2 Unicode 汉字与GB内码的转换