Movatterモバイル変換


[0]ホーム

URL:


CN101047733B - Short message processing method and device - Google Patents

Short message processing method and device
Download PDF

Info

Publication number
CN101047733B
CN101047733BCN2006100914427ACN200610091442ACN101047733BCN 101047733 BCN101047733 BCN 101047733BCN 2006100914427 ACN2006100914427 ACN 2006100914427ACN 200610091442 ACN200610091442 ACN 200610091442ACN 101047733 BCN101047733 BCN 101047733B
Authority
CN
China
Prior art keywords
character
short message
characters
module
coded system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2006100914427A
Other languages
Chinese (zh)
Other versions
CN101047733A (en
Inventor
唐志雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co LtdfiledCriticalHuawei Technologies Co Ltd
Priority to CN2006100914427ApriorityCriticalpatent/CN101047733B/en
Publication of CN101047733ApublicationCriticalpatent/CN101047733A/en
Application grantedgrantedCritical
Publication of CN101047733BpublicationCriticalpatent/CN101047733B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Landscapes

Abstract

Translated fromChinese

本发明公开一种短信处理方法和一种短信处理装置,在短信发送前,所述方法包括步骤:分析短信内字符的类型;在一次短信可以发送的最大长度内,选择能够将该长度内所有类型字符进行编码的一种等级较最高等级编码方式低的编码方式;采用所述选择的编码方式对该长度内字符进行编码,以组建一条短信。本发明可减少长短信的分割条数、提高空间利用率、降低用户费用负担以及降低系统负担。

Figure 200610091442

The invention discloses a short message processing method and a short message processing device. Before the short message is sent, the method includes the steps of: analyzing the type of characters in the short message; within the maximum length of a short message that can be sent, selecting all characters within the length Type characters are encoded in a coding method with a level lower than the highest-level coding method; the selected coding method is used to encode characters within the length to form a short message. The invention can reduce the divided number of long short messages, improve space utilization rate, reduce user expense burden and system burden.

Figure 200610091442

Description

Translated fromChinese
短信处理方法以及装置Short message processing method and device

技术领域technical field

本发明涉及一种通信方法及装置,特别是涉及一种短信处理方法以及装置。The present invention relates to a communication method and device, in particular to a short message processing method and device.

背景技术Background technique

全球信息化为移动通信中的短信(SM,Short Message)通信提供极大的发展前景。据中国信息产业部2006年5月21日的最新统计显示,2006年1至4月,手机短信量达到1322.5亿条,比上年同期增长46.5%。Global informatization provides great development prospects for SMS (SM, Short Message) communication in mobile communications. According to the latest statistics of the Ministry of Information Industry of China on May 21, 2006, from January to April 2006, the volume of mobile phone text messages reached 132.25 billion, an increase of 46.5% over the same period of the previous year.

短信通信数量的飞速增长以及其庞大的增长基数,一方面说明短信通信的技术优势明显,另一方面也表明短信市场很大。由于短信通信已经成为移动用户常用的通信方式,用户为此需要负担一定的费用。随着短信使用量的激增,该类费用增长迅速。其中,长短信成为加重用户费用负担的原因之一。所述长短信是指其容量超出一条短信的最大信息容量、必须采用至少两条短信发送的短信。相对于长短信来说,一次可发送全部内容的短信称为非长短信。The rapid increase in the number of SMS communications and its huge growth base, on the one hand, show that the technical advantages of SMS communications are obvious, and on the other hand, it also shows that the SMS market is huge. Since short message communication has become a common communication method for mobile users, the user needs to bear a certain fee for this. Such charges have grown rapidly as text messaging usage has exploded. Among them, long text messages have become one of the reasons for increasing the user's cost burden. The long short message refers to a short message whose capacity exceeds the maximum information capacity of one short message and must be sent using at least two short messages. Compared with long text messages, short text messages that can send all content at one time are called non-long text messages.

现有技术短信一般采用PDU(Protocol Data Unit,协议数据单元)模式的7位、8位或UCS2编码方式进行编码。这里,在一条短信内仅能采用一种编码方式。其中,7位编码用于发送普通的ASCII字符,它将一串7位的字符(最高位为0)编码成8位的数据,每8个字符可“压缩”成7个;8位编码通常用于发送数据消息,比如图片和铃声等;而UCS2编码用于发送Unicode字符。Short messages in the prior art generally adopt 7-bit, 8-bit or UCS2 encoding methods of PDU (Protocol Data Unit, Protocol Data Unit) mode to encode. Here, only one encoding method can be used in a short message. Among them, 7-bit encoding is used to send ordinary ASCII characters, which encodes a string of 7-bit characters (the highest bit is 0) into 8-bit data, and every 8 characters can be "compressed" into 7; 8-bit encoding is usually Used to send data messages, such as pictures and ring tones, etc.; and UCS2 encoding is used to send Unicode characters.

按照现有短信协议,PDU串的用户信息(TP-UD,TP-User-Data)段最大容量是140字节,所以在这三种编码方式下,可以发送的一条非长短信的最大字符数分别是160、140和70。不管用户发送什么样的内容、字符类型为何的短消息,如果超过最大字符数量,长短信内容将会被分割成多条去发送。这里,将一个英文字母、一个汉字和一个数据字节都视为一个字符。According to the existing SMS protocol, the maximum capacity of the user information (TP-UD, TP-User-Data) section of the PDU string is 140 bytes, so under these three encoding methods, the maximum number of characters of a non-long SMS that can be sent 160, 140 and 70 respectively. No matter what kind of content and character type the short message sent by the user, if the maximum number of characters is exceeded, the content of the long short message will be divided into multiple pieces for sending. Here, an English letter, a Chinese character and a data byte are all regarded as a character.

在这三种编码中,UCS2编码方式所包含的字符范围最大,可以将所有的字符编码;7位编码方式所包含的字符范围只有不到140个字符,7位字符集包括标准7位字符集和扩展7位字符集。扩展7位字符需要在其前面加一个转义字符才能够参与编码。转义字符是标准7位字符,只在有扩展7位字符时才会使用;8位编码方式的范围只有256个字符。Among these three encodings, the UCS2 encoding method contains the largest range of characters and can encode all characters; the character range contained in the 7-bit encoding method is less than 140 characters, and the 7-bit character set includes the standard 7-bit character set and extended 7-bit character set. Extended 7-bit characters need to be preceded by an escape character to be able to participate in encoding. The escape character is a standard 7-bit character, which is only used when there are extended 7-bit characters; the range of the 8-bit encoding method is only 256 characters.

在PDU模式中,按照现有短信协议,一条长短信被分割以后,单条子短信所能够发送的最大字符数量分别是:In PDU mode, according to the existing short message protocol, after a long short message is divided, the maximum number of characters that can be sent in a single short message is:

1)在7位编码方式下:153个字符;1) In 7-bit encoding mode: 153 characters;

2)在8位编码方式下:134个字符;2) In 8-bit encoding mode: 134 characters;

3)UCS2编码方式下:67个字符。3) In UCS2 encoding mode: 67 characters.

与前面提到的非长短信三种编码下最大字符数量160、140和70相比,长短信被分割后的子短信在每种编码方式下的最大字符数量都少了几个,这是因为这几个字符要被用于存放长短信的一些标识信息。Compared with the maximum number of characters of 160, 140 and 70 under the three encodings of the non-long SMS mentioned above, the maximum number of characters of the sub-messages after the long SMS is divided is a few less in each encoding method, because These characters are used to store some identification information of the long message.

在7位、8位和UCS2编码这三种编码下,每一个字符在存储器中所占用的空间分别为:Under the three encodings of 7-bit, 8-bit and UCS2 encoding, the space occupied by each character in the memory is:

1)7位的编码方式下每七位构成一个字符;1) Under the 7-bit encoding method, every seven bits constitute a character;

2)8位编码方式下每八位构成一个字符;2) In the 8-bit encoding mode, every eight bits constitute a character;

3)UCS2编码方式下每十六位构成一个字符。3) In the UCS2 encoding mode, every sixteen bits form a character.

其中,扩展7位字符编码时有点特殊,一个扩展7位字符在7位编码方式下需要在扩展字符前面加上一个转义字符,共2个7位字符参与编码,在存储器中所占长度需要14位;在8位编码方式下,如果该扩展7位字符在8位编码范围内,编码时只需要该字符参与编码,在存储器中所占长度为8位;如果该字符不在8位编码范围内,则该字符只能用UCS2编码,需要占用的存储长度为16位。因此,在本文中现有技术短信的编码优先级别从低到高分别是7位、8位和UCS2编码。最高等级编码是UCS2编码,除了在上述字符不在8位编码范围内的情况,8位编码等级高于7位编码等级。Among them, the encoding of extended 7-bit characters is a bit special. An extended 7-bit character needs to add an escape character in front of the extended character in the 7-bit encoding mode. A total of 2 7-bit characters participate in encoding, and the length occupied in the memory needs to be 14 bits; in the 8-bit encoding mode, if the extended 7-bit character is within the 8-bit encoding range, only this character is required to participate in the encoding during encoding, and the length occupied in the memory is 8 bits; if the character is not within the 8-bit encoding range , the character can only be encoded with UCS2, and the required storage length is 16 bits. Therefore, in this paper, the encoding priority levels of short messages in the prior art are 7-bit, 8-bit and UCS2 encoding respectively from low to high. The highest level of encoding is UCS2 encoding, except in the case of the above-mentioned characters that are not within the range of 8-bit encoding, the 8-bit encoding level is higher than the 7-bit encoding level.

所以在这三种编码下,一条长短信所包含的字符数在存储器中所占长度为分别为:Therefore, under these three encodings, the length of the number of characters contained in a long text message in the memory is respectively:

1)7位编码方式下:153×7=1071位;1) In 7-bit coding mode: 153×7=1071 bits;

2)8位编码方式下:134×8=1072位;2) In 8-bit encoding mode: 134×8=1072 bits;

3)UCS2编码方式下:67×16=1072位。3) In UCS2 coding mode: 67*16=1072 bits.

因此,长短信在存储器的最大长度为1072位。而在这三种编码方式下,一条非长短信在存储器中所占长度最大值都是1120(160×7、或140×8、或70×16)位。如果一条短信所包含信息内容的位数大于1120位,则可以判断该短信为长短信。Therefore, the maximum length of the long message in the memory is 1072 bits. And under these three encoding modes, the maximum length of a non-long message in the memory is 1120 (160*7, or 140*8, or 70*16) bits. If the number of bits of the information contained in a short message is greater than 1120, it can be determined that the short message is a long short message.

根据现有短信协议,一条非长短信中仅采用一种编码方式,当一条非长短信中包含超过一种编码所能处理的字符时,比如7位编码和UCS2编码共存在一条短信中时,采用包含字符范围最大的编码方式-UCS2进行编码,以使得对所有短信内容编码。此时,16位长度的UCS2编码所造成的存储浪费不能避免。也即对本来可以用7/8位编码的字符统一为16位Unicode编码字符,在最极端情况下会增加一倍的存储容量,增加了短信分割条数。加重用户费用负担的同时还加重通信网络的负担。According to the existing SMS protocol, only one encoding method is used in a non-long SMS. When a non-long SMS contains characters that can be processed by more than one encoding, for example, when 7-bit encoding and UCS2 encoding coexist in a SMS, Use UCS2, the encoding method with the largest range of characters, to encode all SMS content. At this time, the storage waste caused by the 16-bit UCS2 encoding cannot be avoided. That is to say, the characters that can be coded by 7/8 bits are unified into 16-bit Unicode coded characters. In the most extreme case, the storage capacity will be doubled, and the number of SMS divisions will be increased. While increasing the burden of user charges, it also increases the burden on the communication network.

例如:短信的内容有160个字符,前150个为7位字符,后10个为Unicode字符,如果简单地统一采用UCS2编码方式,每个字符分配16位存储容量,则存储容量是:160×16=2560。对于长短信来说,每个子短信的最大容量是:1072,所以这160个字符必须分成3条子短信(67+67+26)发送。For example: the content of the SMS has 160 characters, the first 150 are 7-bit characters, and the last 10 are Unicode characters. If the UCS2 encoding method is simply adopted uniformly, and each character is allocated 16-bit storage capacity, the storage capacity is: 160× 16=2560. For long short messages, the maximum capacity of each sub-message is: 1072, so these 160 characters must be divided into 3 sub-messages (67+67+26) to send.

显然,在这样的多种字符混杂的情况下,长短信简单地统一采用UCS2编码方式可能大大地降低单条短信(分割后)空间利用率。Apparently, in the case of such a mixture of multiple characters, simply uniformly adopting the UCS2 encoding method for long short messages may greatly reduce the space utilization rate of a single short message (after segmentation).

发明内容Contents of the invention

本发明要解决的技术问题是提供一种减少长短信分割条数的短信处理方法。The technical problem to be solved by the present invention is to provide a short message processing method that reduces the number of split long short messages.

本发明要解决的技术问题是还提供一种减少长短信分割条数的短信处理装置。The technical problem to be solved by the present invention is to provide a short message processing device that reduces the number of split long short messages.

为解决上述第一技术问题,本发明的目的是通过以下技术方案实现的:提供一种短信处理方法,在短信发送前,包括步骤:分析短信内字符的类型,所述分析短信内字符的类型包括:从需发送的短信中逐个读取字符,比较当前字符编码方式与刚好能对已读全部字符编码的编码方式,在当前字符编码方式等级较高时,将当前编码方式替换所保存的编码方式,记录当前字符的字符偏移量;依据保存的编码方式统计当前需要参与编码的字符的数量,依据保存的编码方式与当前所统计的需要参与编码的字符的数量,计算需要参与编码的字符需要占用的存储长度;如果所述存储长度小于或等于一次短信可以发送的最大长度,单独保存所述记录的字符偏移量和所述保存的编码方式,返回读取字符的步骤;如果所述存储长度大于所述最大长度且短信类型为长短信时,选择单独保存的编码方式,使用单独保存的字符偏移量作为长短信截取点;采用所述选择的编码方式对该最大长度内字符进行编码,以组建一条短信。In order to solve the above-mentioned first technical problem, the purpose of the present invention is achieved by the following technical solutions: a short message processing method is provided, before the short message is sent, including the steps of: analyzing the type of characters in the short message, and analyzing the type of characters in the short message Including: read characters one by one from the text message to be sent, compare the current character encoding method with the encoding method that can just encode all the characters that have been read, and replace the saved encoding method when the current character encoding method is of a higher level method, record the character offset of the current character; count the number of characters that need to be encoded according to the saved encoding method, and calculate the characters that need to be encoded according to the saved encoding method and the current count of the number of characters that need to be encoded The storage length that needs to be occupied; if the storage length is less than or equal to the maximum length that can be sent in one short message, save the character offset of the record and the encoding method saved separately, and return to the step of reading characters; if the When the storage length is greater than the maximum length and the text message type is a long text message, select the separately stored encoding method, and use the separately stored character offset as the long text message interception point; code to form a text message.

定所述最大长度的步骤为:判断短信类型,是长短信则所述最大长度为1072位,否则所述最大长度为1120位。The step of determining the maximum length is: judging the type of short message, if it is a long short message, the maximum length is 1072 bits, otherwise the maximum length is 1120 bits.

所述判断短信类型的步骤包括:判断占用的存储长度是否小于或等于1120位,是并且已经读完全部字符则判断短信类型为非长短信,否则判断短信类型为长短信。The step of determining the type of short message includes: judging whether the occupied storage length is less than or equal to 1120 bits, if yes and all characters have been read, then it is judged that the short message type is a non-long short message, otherwise it is judged that the short message type is a long short message.

判断占用的存储长度小于或等于1072位、并且还剩未读取字符时进一步包括:单独保存所读取的字符偏移量与编码方式的步骤,所述编码方式用于在确定短信类型为非长短信时对字符进行编码。Judging that the occupied storage length is less than or equal to 1072 bits, and when there are still unread characters left, it further includes: the step of separately saving the read character offset and the encoding method, and the encoding method is used to determine that the message type is non- Encode characters for long text messages.

在所述占据的存储长度大于1072位、并且判断短信类型为长短信时进一步包括,返回记录本次所读取的字符偏移量与编码方式的步骤,所述字符偏移量作为长短信截取点,所述编码方式用于对字符进行编码。When the occupied storage length is greater than 1072 bits and it is judged that the short message type is a long short message, it further includes the step of returning and recording the character offset and encoding method read this time, and the character offset is intercepted as a long short message point, the encoding method is used to encode characters.

在分析短信内字符的类型之前进一步包括,初始化短信类型为非长短信的步骤;在判断短信类型为长短信时继续进行读取字符;在再次读取到长短信一次发送的最大长度时截取该长度内的字符。Further include before analyzing the type of characters in the short message, initialize the short message type to be the step of non-long short message; Continue to read characters when judging that the short message type is a long short message; characters within the length.

在判断该字符所属类型为只能用UCS2编码的类型时,判断整个短信的长度,如果整个短信长度大于70个字符,则判断该短信的短信类型是长短信,此时判断字符偏移量是否大于67个字符,是则将从此字符前一个字符开始往前的67个字符截取并编码。When judging that the type of the character is a type that can only be coded by UCS2, judge the length of the entire text message. If the length of the entire text message is greater than 70 characters, then judge that the text message type of the text message is a long text message. At this time, judge whether the character offset is More than 67 characters, if it is, the 67 characters from the character before this character will be intercepted and encoded.

所述选择的编码方式是能够将该长度内所有类型字符进行编码的最低等级编码方式。The selected encoding mode is the lowest level encoding mode capable of encoding all types of characters within the length.

为解决上述第二技术问题,本发明的目的是通过以下技术方案实现的:提供一种短信处理装置,包括字符类型分析模块、编码方式判断模块以及编码模块,所述字符类型分析模块包括字符读取模块和字符类型与编码判断模块,所述编码方式判断模块包括编码方式比较模块和编码方式保存模块,所述字符读取模块用于从需发送的短信中逐个读取字符,所述字符类型与编码判断模块用于判断该字符所属类型和当前编码方式,所述编码方式比较模块用于比较所述当前字符编码方式与刚好能对已读全部字符编码的编码方式,在当前字符编码方式等级较高情况下,保存当前字符编码方式,并由字符读取模块继续读取字符,直至已读取字符的存储长度达到最大长度所述编码方式判断模块用于在一次短信可以发送的最大长度内,选择保存的编码方式,所述编码模块用于采用所述选择的编码方式对该长度内字符进行编码,以组建一条短信。In order to solve the above-mentioned second technical problem, the purpose of the present invention is achieved through the following technical solutions: a short message processing device is provided, including a character type analysis module, an encoding mode judgment module and an encoding module, and the character type analysis module includes a character reading Get module and character type and coding judging module, described coding mode judging module comprises coding mode comparison module and coding mode preservation module, described character reading module is used to read character one by one from the note that needs to send, and described character type The coding judgment module is used to judge the type of the character and the current coding method, and the coding method comparison module is used to compare the current character coding method with the coding method that can just encode all characters that have been read, at the current character coding method level Under higher circumstances, save the current character encoding mode, and continue to read characters by the character reading module until the storage length of the read character reaches the maximum length. , select the saved encoding method, and the encoding module is used to encode the characters within the length by using the selected encoding method, so as to form a short message.

进一步包括短信类型判断模块,其包括字符数量统计模块、存储长度计算模块以及存储长度判断模块,所述字符数量统计模块用于依据保存的编码方式统计当前需要参与编码的字符数量,所述存储长度计算模块用于依据所述保存的编码方式与当前所统计的参与编码的字符总数,计算该等字符总共需要占用的存储长度,所述存储长度判断模块用于判断占用的存储长度是否小于或等于1120位,是并且已经读完全部字符则判断短信类型为非长短信,否则判断短信类型为长短信。Further comprising short message type judging module, it comprises character quantity statistics module, storage length calculation module and storage length judgment module, described character quantity statistics module is used to need to participate in the character quantity of encoding at present according to the coding mode statistics of preservation, and described storage length The calculation module is used to calculate the storage length that these characters need to occupy in total according to the saved encoding method and the current counted total number of characters participating in the encoding, and the storage length judging module is used to judge whether the occupied storage length is less than or equal to 1120 characters, if yes and all characters have been read, then it is judged that the type of the message is a non-long message, otherwise it is judged that the type of the message is a long message.

所述短信类型判断模块进一步包括字符偏移量与编码方式单独保存模块,其用于在判断占用的存储长度小于或等于1072位、并且还剩未读取字符时单独保存所读取的字符偏移量与编码方式,所述编码方式用于在确定短信类型为非长短信时对字符进行编码。The short message type judging module further includes a character offset and an encoding mode to save the module independently, which is used to save the read character offset separately when judging that the occupied storage length is less than or equal to 1072 bits and when there are still unread characters. The displacement and the encoding method, the encoding method is used to encode the characters when it is determined that the type of the short message is a non-long short message.

进一步包括长短信截取模块,在所述占据的存储长度大于1072位、并且判断短信类型为长短信时,所述长短信截取模块用于根据记录的本次所读取的字符偏移量与编码方式、分别设定所述长短信截取点对字符进行编码。Further comprising a long short message intercepting module, when the occupied storage length is greater than 1072 bits and judging that the short message type is a long short message, the long short message intercepting module is used to read the character offset and encoding according to the recorded this time way, respectively setting the interception point of the long short message to encode the characters.

在字符类型分析模块判断该字符所属类型为UCS2编码时,由字符偏移量记录模块记录整个短信的长度,在整个短信长度大于70字符情况下,所述短信类型判断模块判断得到该短信的短信类型是长短信,并在字符偏移量记录模块记录的字符偏移量大于67字符情况下,由长短信截取模块将从此字符前一个字符开始往前的67个字符截取,并由编码模块进行编码。When the character type analysis module judged that the type of this character belonged to UCS2 encoding, the length of the whole note was recorded by the character offset recording module, and when the length of the whole note was greater than 70 characters, the short note of the note was obtained by the judgment of the short note type judgment module The type is a long text message, and when the character offset recorded by the character offset recording module is greater than 67 characters, the long text message interception module will intercept the 67 characters from the previous character of this character, and the encoding module will carry out coding.

以上第一技术方案可以看出,由于本发明在多种类型字符混合的长短信需要发送前,对每条分割的短信采用较优编码方式进行编码,避免统一用一种编码方式对整条短信编码。也就是动态地分析每一个字符所属的字符类型来决定一条能够独立发送的短信所应该选取的较优编码方式、动态统计在较优编码方式下参与编码的字符数量最大值,取可以实现独立发送短信的字符数量最多的一种编码方式来组成一条可独立发送的短信,使一条独立发送的短信能够发送最大数量的信息内容。对所述长短信重复使用上述方法,则对整个短信内容而言可能就会采用多种编码方式与分割方式,避免简单地对整条短信使用单一的编码方式而导致长短信分割后发送次数多、发送的短信空间利用率低、用户费用增加的情况,从而减少长短信的分割条数、提高空间利用率、减轻用户费用负担以及降低系统负担。As can be seen from the above first technical solution, because the present invention uses a better encoding method to encode each segmented short message before the long short message mixed with multiple types of characters needs to be sent, so as to avoid unifying the entire short message with one coding method coding. That is to dynamically analyze the character type of each character to determine the optimal encoding method that should be selected for a short message that can be sent independently, and dynamically count the maximum number of characters participating in encoding under the optimal encoding method, which can realize independent sending A coding method with the largest number of characters in a short message is used to form a short message that can be sent independently, so that a short message that can be sent independently can send the maximum amount of information content. If the above-mentioned method is repeatedly used for the long short message, multiple encoding methods and segmentation methods may be adopted for the entire short message content, so as to avoid simply using a single encoding method for the entire short message and cause the long short message to be divided and sent many times. , The space utilization rate of the sent short message is low, and the user fee increases, thereby reducing the number of long short message divisions, improving the space utilization rate, reducing the user cost burden and reducing the system burden.

以上第二技术方案可以看出,由于本发明在多种类型字符混合的长短信需要发送前,采用编码方式判断模块和短信类型判断模块判断出能对单条分割的子短信字符全部编码的较优编码方式,避免统一用一种编码方式对整条短信编码。也就是采用字符类型分析模块动态地分析每一个字符所属的字符类型,并配合编码方式判断模块来决定一条能够独立发送的短信所应该选取的较优编码方式、动态统计在较优编码方式下参与编码的字符数量最大值,取可以实现独立发送短信的字符数量最多的一种编码方式来组成一条可独立发送的短信,使一条独立发送的短信能够发送最大数量的信息内容。避免简单地对整条短信使用单一的编码方式而导致长短信分割后发送次数多、发送的短信空间利用率低、用户费用增加的情况,从而减少长短信的分割条数、提高空间利用率、降低用户费用负担以及降低系统负担。As can be seen from the above second technical scheme, because the present invention needs to send before the long short message that multiple types of characters mix, adopts coding mode judging module and short message type judging module to judge that the sub-short message character that can single division is all coded is better Encoding method, to avoid encoding the whole message with one encoding method uniformly. That is to use the character type analysis module to dynamically analyze the character type of each character, and cooperate with the encoding method judgment module to determine the optimal encoding method that should be selected for a short message that can be sent independently, and the dynamic statistics are involved in the optimal encoding method. The maximum number of encoded characters is to use the encoding method with the largest number of characters that can be sent independently to form an independently sent short message, so that an independently sent short message can send the maximum amount of information content. Avoid simply using a single encoding method for the entire text message, resulting in many times of sending long text messages after splitting, low space utilization of sent short messages, and increased user fees, thereby reducing the number of split long text messages and improving space utilization. Reduce user cost burden and reduce system burden.

附图说明Description of drawings

图1是本发明短信处理方法第一实施方式的流程图;Fig. 1 is the flowchart of the first embodiment of the short message processing method of the present invention;

图2是判断字符类型与编码方式的流程图;Fig. 2 is the flowchart of judging character type and coding mode;

图3是统计参与编码的字符数量的流程图;Fig. 3 is a flow chart of counting the number of characters involved in encoding;

图4是本发明短信处理方法第二实施方式的流程图;Fig. 4 is the flowchart of the second embodiment of the short message processing method of the present invention;

图5是本发明短信处理装置的原理框图。Fig. 5 is a functional block diagram of the short message processing device of the present invention.

具体实施方式Detailed ways

本发明基本原理是:在多种类型字符混合的长短信需要发送前,为避免简单的对整条短信使用单一的编码方式而致使发送的短信空间利用率低、用户费用增加的情况。本发明提出一种在遵守现有协议的基础上,对长短信内容进行短信分割,动态地分析每一个字符所属的字符类型来决定一条能够独立发送的短信所应该选取的较优编码方式、动态统计在较优编码方式下参与编码的字符数量最大值,取可以实现独立发送短信的字符数量最多的一种编码方式来组成一条可独立发送的短信,使一条独立发送的短信能够发送最大数量的信息内容。对所述长短信重复使用上述方法,则对整个短信内容而言可能就会采用多种编码方式与分割方式,从而减少长短信的分割条数、提高空间利用率、减少信息费用。The basic principle of the present invention is: before the long short message mixed with various types of characters needs to be sent, in order to avoid simply using a single encoding method for the entire short message, the space utilization rate of the sent short message is low and the user fee increases. The present invention proposes a short message segmentation for long short message content on the basis of complying with the existing protocol, and dynamically analyzes the character type to which each character belongs to determine an optimal encoding method that should be selected for a short message that can be sent independently. Count the maximum number of characters involved in coding under the optimal coding method, and take the coding method that can realize the largest number of characters to send short messages independently to form a short message that can be sent independently, so that a short message that can be sent independently can send the maximum number of text messages. information. If the above method is repeatedly used for the long short message, multiple encoding methods and segmentation methods may be used for the entire short message content, thereby reducing the number of long short message segments, improving space utilization, and reducing information costs.

基于以上原理,本发明提供多个实施方式以实现发明目的,分别举未知短信类型、已知长短信和长短信下快速处理短信内容的实施方式以充分说明本发明。以下结合实施方式和附图,对本发明进行详细描述。Based on the above principles, the present invention provides multiple implementations to achieve the purpose of the invention. The implementations of fast processing of short message content under unknown short message type, known long short message and long short short message are respectively given to fully illustrate the present invention. The present invention will be described in detail below in conjunction with the embodiments and the accompanying drawings.

参阅图1,以下是对编辑完成的短信进行分割的流程,以便在分割完成之后进行发送。本发明第一实施方式是在未知该已编辑完成的短信是否为长短信类型情况进行短信分割的处理:Referring to Figure 1, the following is the flow of dividing the edited short message so that it can be sent after the division is completed. The first embodiment of the present invention is to carry out the processing of short message division whether this edited short message is the long short message type situation of unknown:

A、初始化需发送短信的编码方式为最低等级,也就是7位编码方式,初始化短信类型为非长短信,扩展7位字符为0。A. The encoding method of the SMS to be sent for initialization is the lowest level, that is, the 7-digit encoding method. The type of the initialization SMS is non-long SMS, and the extended 7-digit character is 0.

初始化的作用在于无论短信内容为何,预先设定其编码方式为最低等级并且短信类型为非长短信,如果分割结果是全部短信内容属于最低等级编码,即不需要采用较高级编码方式,从而节省存储空间,减少分割条数。如果最后判断的结果是长短信,那么可能进行多次分割,每次分割的流程步骤A都需要初始化需发送短信的编码方式为最低等级,但短信类型标识仅初始化一次,即如果多次分割,第一次分割的流程需要初始化,其他分割流程不需要初始化。The function of initialization is to pre-set the encoding method as the lowest level and the SMS type as non-long SMS no matter what the SMS content is. If the result of segmentation is that all SMS content belongs to the lowest-level encoding, it does not need to use a higher-level encoding method, thereby saving storage. Space, reduce the number of partitions. If the result of the final judgment is a long text message, then multiple divisions may be performed, and the process step A of each division needs to initialize the encoding method of the text message to be sent to the lowest level, but the text message type identifier is only initialized once, that is, if it is divided multiple times, The process of the first split needs to be initialized, and the other split processes do not need to be initialized.

以下步骤B、C和D是分析短信内字符类型的处理流程。The following steps B, C and D are the processing flow for analyzing the type of characters in the short message.

B、从需发送的短信内容中读取一个字符,判断是否字符结束标识,是则更改结束标识,直接转到步骤M,否则继续以下步骤。B. Read a character from the content of the short message to be sent, judge whether the character ends, if yes, change the end mark, and directly go to step M, otherwise continue the following steps.

从第一个字符开始分析,如果第一个字符就已经是短信结尾,说明是空短信,在后续循环分析中,如果该字符表明已经是短信结尾,则没有必要再分析,结束循环。Start the analysis from the first character. If the first character is already the end of the message, it means that it is an empty message. In the subsequent loop analysis, if the character indicates that it is the end of the message, there is no need to analyze it again and end the cycle.

C、记录本次所读取的字符偏移量。C. Record the character offset read this time.

所述的字符偏移量是指相对于本次分割短信过程的第一次读取字符的位置来说的,比如第一次读取字符,那么这个字符偏移量是1,第n次读取字符,那么这个字符偏移量是n。读取的间隔是16位,因为在此之前,所有属于7位、扩展7位、8位或者UCS2编码类型的字符都转换成了Unicode字符。在后续的分割短信流程中,所述字符偏移量作为短信分割点的依据。每分割一次后,字符偏移量都会清零。The character offset refers to the position of the character read for the first time relative to the process of splitting the short message. For example, if the character is read for the first time, the character offset is 1, and the nth read Take characters, then the character offset is n. The reading interval is 16 bits, because before that, all characters belonging to 7-bit, extended 7-bit, 8-bit or UCS2 encoding types are converted to Unicode characters. In the subsequent process of splitting the short message, the character offset is used as the basis for the split point of the short message. After each division, the character offset will be cleared to zero.

D、判断该字符所属类型和编码方式。D. Determine the type and encoding method of the character.

即判断该字符所属的编码类型,分析其属于标准7位、扩展7位、8位或者UCS2编码字符的哪一种,进而确定7位、8位或者UCS2编码方式。That is to judge the encoding type of the character, analyze which one it belongs to, standard 7-bit, extended 7-bit, 8-bit or UCS2 encoded character, and then determine the 7-bit, 8-bit or UCS2 encoding method.

以下步骤E~K处理的目的是:在一次短信可以发送的最大长度内而且不含只能采用最高等级编码进行编码的字符类型情况下,选择能够将该长度内所有类型字符进行编码的一种等级较最高等级编码方式低的编码方式;如果有UCS2编码字符,则按照UCS2编码方式对含该UCS2编码字符的独立短信进行编码。The purpose of the following steps E-K is to select a character type that can encode all types of characters within the length within the maximum length that can be sent in a short message and does not contain characters that can only be encoded with the highest level of encoding An encoding method with a lower level than the highest level encoding method; if there are UCS2 encoded characters, encode the independent short message containing the UCS2 encoded characters according to the UCS2 encoding method.

E、与此前保存的较高等级编码方式比较,如果此字符的编码方式等级更高,则将当前编码方式替换所保存的编码方式。E. Compared with the higher-level encoding method saved before, if the encoding method of this character is higher, replace the saved encoding method with the current encoding method.

此步骤的目的在于:因为每个单独发送的短信只能采用一种编码方式,而且是采用能对全部字符进行编码的编码方式来进行编码,因此这里在分析短信字符的同时,将分析到的属于较高等级的字符的编码方式记录下来,这样就能得到短信里面字符所属最高等级的编码方式,因而利于短信内容占用存储量长度计算。The purpose of this step is: because each text message sent separately can only adopt one encoding method, and it is encoded by an encoding method that can encode all characters, so here when analyzing the text message characters, the analyzed The encoding mode of the characters belonging to a higher level is recorded, so that the encoding mode of the highest level of characters in the short message can be obtained, which is beneficial to the calculation of the length of the memory occupied by the content of the short message.

本步骤具体过程请参阅图2,具体是:Please refer to Figure 2 for the specific process of this step, specifically:

E1、判断所述保存的较高等级编码方式是否等于UCS2,如果是则进入步骤E2,否则进入步骤E3;E1, judging whether the stored higher-level encoding method is equal to UCS2, if so, enter step E2, otherwise enter step E3;

E2、直接返回UCS2编码方式;E2. Directly return to the UCS2 encoding method;

因为UCS2编码为最高级别编码,可以对任意字符编码,并且必须采用UCS2编码方式进行编码,所以没有必要再分析读取的字符类型与编码方式,直接返回UCS2编码类型,分析结束。Because UCS2 encoding is the highest level of encoding, any character can be encoded, and UCS2 encoding must be used for encoding, so there is no need to analyze the read character type and encoding method, and the UCS2 encoding type is returned directly, and the analysis ends.

E3、判断较高等级编码方式是否为8位编码方式,是则进入步骤E4,否则进入步骤E6;E3, judging whether the higher-level encoding method is an 8-bit encoding method, if yes, enter step E4, otherwise enter step E6;

8位编码优先级是高于7位编码的编码方式,此处对保存的较高等级编码方式是否为8位编码进行判断,是进行后续与本字符的编码方式进行比较的前提。The priority of 8-bit encoding is higher than that of 7-bit encoding. Here, judging whether the stored higher-level encoding is 8-bit encoding is the premise of subsequent comparison with the encoding mode of this character.

E4、判断本字符的国际字符码值是否小于等于255,是则进入步骤E5,否则进入步骤E2;E4, judge whether the international character code value of this character is less than or equal to 255, then enter step E5, otherwise enter step E2;

在保存的较高等级编码方式是否为8位编码方式和本字符的国际字符码值是否小于等于255情况下,说明本字符的编码方式等级不可能高于8位编码方式等级,因此进入步骤E5,返回8位编码方式,不需要替换已保存的较高等级编码方式;如果大于255,则说明本字符不能用8位编码方式进行编码,只能用UCS2编码方式进行编码,因此进入步骤E2。In the case of whether the stored higher-level encoding mode is an 8-bit encoding mode and whether the international character code value of this character is less than or equal to 255, it indicates that the encoding mode level of this character cannot be higher than the 8-bit encoding mode level, so enter step E5 , return to the 8-bit encoding mode, and there is no need to replace the saved higher-level encoding mode; if it is greater than 255, it means that the character cannot be encoded in the 8-bit encoding mode, but can only be encoded in the UCS2 encoding mode, so enter step E2.

E5、返回8位编码方式,分析结束;E5, return to the 8-bit encoding method, and the analysis ends;

E6、判断本字符是否为标准7位编码字符,是则进入步骤E7,否则进入步骤E8;E6, judge whether this character is a standard 7-bit coded character, if yes then enter step E7, otherwise enter step E8;

在较高等级编码不属于UCS2和8位编码情况下,本字符可能是8位、标准7位或扩展7位编码字符,因此在标准7位编码字符集中查找本字符,如果找到了本字符,进入步骤E7。In the case that the higher-level encoding does not belong to UCS2 and 8-bit encoding, this character may be an 8-bit, standard 7-bit or extended 7-bit coded character, so look for this character in the standard 7-bit coded character set. If this character is found, Go to step E7.

E7、返回7位编码方式,分析结束;E7, return to the 7-bit encoding mode, and the analysis ends;

本步骤说明保存的较高等级编码和本字符编码方式一致,因此返回7位编码方式,不需要替换保存的较高等级编码。This step shows that the saved higher-level encoding is consistent with the encoding method of this character, so the 7-bit encoding method is returned, and there is no need to replace the saved higher-level encoding.

E8、判断本字符是否为扩展7位编码字符,是则进入步骤E9,否则进入步骤E10;E8, judging whether this character is an extended 7-bit coded character, if yes, enter step E9, otherwise enter step E10;

在本字符可能是扩展7位编码情况下,在扩展7位编码字符集中查找本字符,如果找到了本字符,进入步骤E9。In the case that the character may be an extended 7-bit code, the character is searched in the extended 7-bit coded character set, and if the character is found, go to step E9.

E9、记录该扩展7位编码类型,分析结束;E9, record the extended 7-bit encoding type, and the analysis ends;

本步骤目的并不在于替换保存的较高等级编码,目的在于在后续统计参与编码的字符时,作为是否将参与编码字符数量增1的依据。The purpose of this step is not to replace the stored higher-level codes, but to serve as a basis for whether to increase the number of coded characters by 1 when counting the characters that participate in the code.

E10、判断本字符的所属的国际码字符码值是否小于等于255,如果本字符的国际字符码值大于255,进入步骤E2,否则进入步骤E11;E10, judge whether the international character code value of this character is less than or equal to 255, if the international character code value of this character is greater than 255, enter step E2, otherwise enter step E11;

因为在此步骤中,本字符可能是属于UCS2编码,因此判断其国际码字符码值是否大于255,是则进入步骤E2,返回UCS2编码方式,否则进入步骤E11。Because in this step, this character may belong to UCS2 encoding, therefore judge whether its international code character code value is greater than 255, then enter step E2, return UCS2 encoding mode, otherwise enter step E11.

E11、判断本次分割前分析的所有字符中是否有码值大于255的字符,是则进入步骤E2,否则进入步骤E5。E11. Determine whether there is a character with a code value greater than 255 among all the characters analyzed before this segmentation, if yes, go to step E2, otherwise go to step E5.

在确定本字符编码不属于UCS2编码下,分析本次分割前分析的所有字符中是否有码值大于255的字符,如果有,则应该将已保存较高等级编码方式替换为UCS2编码,否则返回8位编码方式,最终完成与此前保存的较高等级编码方式比较的流程。After confirming that this character encoding does not belong to UCS2 encoding, analyze whether there is a character with a code value greater than 255 among all the characters analyzed before this segmentation. If so, replace the saved higher-level encoding with UCS2 encoding, otherwise return 8-bit encoding, finally completing the process of comparison with the previously saved higher-level encoding.

F、依据较高等级编码方式统计当前需要参与编码的字符数量。F. Count the number of characters that currently need to be encoded according to the higher-level encoding method.

此步骤的目的在于:在每次分析字符的循环中,统计出按照当前较高等级编码方式对应的已读的参与编码的字符数量,以便于用作后续判断是否长短信的参数。所述的参与编码的字符数量可能与上述本次所读取的字符偏移量不同,比如对一个扩展7位编码字符来说,其读取时字符偏移量是增1,而此时参与编码的字符数量就增2,因为字符偏移量的计算是16位步进的,而参与编码的字符数量是按编码来计算。扩展7位编码字符是增加一个7位编码字符后组成的存储长度为14位的字符,如果最后记录的较高等级编码方式是7位编码,则参与编码的字符数多余已记录的字符偏移量。The purpose of this step is to count the number of characters that have been read and participate in encoding according to the current higher-level encoding mode in each cycle of character analysis, so as to be used as parameters for subsequent judgments on whether to send short messages. The number of characters participating in the encoding may be different from the above-mentioned character offset read this time. For example, for an extended 7-bit encoded character, the character offset is increased by 1 when reading, and at this time the participating The number of encoded characters is increased by 2, because the calculation of the character offset is in 16-bit steps, and the number of characters involved in encoding is calculated according to the encoding. The extended 7-bit encoded character is a character with a storage length of 14 bits formed by adding a 7-bit encoded character. If the last recorded higher-level encoding method is a 7-bit encoded character, the number of characters involved in encoding exceeds the recorded character offset quantity.

本步骤具体流程请参阅图3,包括:Please refer to Figure 3 for the specific process of this step, including:

F1、判断7位编码是否为保存的较高等级编码方式,如果不是则进入步骤F2,否则进入步骤F3;F1, judging whether the 7-bit encoding is a higher-level encoding method for preservation, if not, enter step F2, otherwise enter step F3;

判断7位编码是否为保存的较高等级编码方式,目的在于得知保存的较高等级编码方式是否为8位编码或UCS2编码,如果是则因为参与编码的字符与偏移量一致,因此不需要后续的参与编码的字符量的计算;否则需要计算参与编码的字符量。The purpose of judging whether the 7-bit encoding is the higher-level encoding method saved is to know whether the higher-level encoding method saved is 8-bit encoding or UCS2 encoding. If so, because the characters involved in the encoding are consistent with the offset, it is not Subsequent calculation of the amount of characters involved in encoding is required; otherwise, the amount of characters involved in encoding needs to be calculated.

F2、返回本次分割前分析的字符总数;F2, return the total number of characters analyzed before this segmentation;

不要后续的参与编码的字符量的计算,返回本次分割前分析的字符总数。Do not calculate the number of characters involved in the subsequent encoding, and return the total number of characters analyzed before this split.

F3、取当前字符类型;F3, get the current character type;

从上述步骤E取得当前字符类型,以作为后续计算的参数。The current character type is obtained from the above step E as a parameter for subsequent calculation.

F4、判断是否扩展7位编码字符,是则进入步骤F5,否则进入步骤F6;F4, judge whether to expand 7 coded characters, then enter step F5, otherwise enter step F6;

F5、本次分析的扩展7位编码字符总数加1;F5. Add 1 to the total number of extended 7-bit encoded characters analyzed this time;

由于扩展7位编码字符是增加一个7位编码字符后组成的存储长度为14位的字符,因此需参与编码字符量需要增1,因为在8位或UCS2编码时存储长度为14位的扩展7位编码字符其字符量算作1个,在7位编码方式下则必须算作2个。Since the extended 7-bit coded character is a character with a storage length of 14 bits formed by adding a 7-bit coded character, the number of characters required to participate in the coding needs to be increased by 1, because the extended 7 with a storage length of 14 bits is stored in 8-bit or UCS2 encoding Bit-coded characters are counted as 1 character, and must be counted as 2 characters in 7-bit coded mode.

F6、返回本次分析的字符总数与扩展7位编码字符总数之和。F6. Return the sum of the total number of characters analyzed this time and the total number of extended 7-bit coded characters.

将统计的非扩展7位编码的字符量,加上经过上述计算的扩展7位编码字符总数,即可得到当前需要参与编码的字符数量。Add the counted characters of non-extended 7-bit codes to the total number of extended 7-bit coded characters calculated above to obtain the current number of characters that need to participate in encoding.

G、使用最优的编码方式,与当前所统计的参与编码的字符总数,来计算这些字符数总共需要占用的存储长度。G. Use the optimal encoding method and the total number of characters involved in the encoding currently counted to calculate the total storage length required for these characters.

本步骤是利用上述较高等级编码方式和已记录的本次读取字符偏移量,算出在所述较高等级编码方式下这些已记录字符数量占据的存储长度。比如本次循环分析到第70个字符,发现该字符是UCS2编码字符,则70个字符都必须采用UCS2编码方式进行编码,那上述字符占用的存储长度可以这样计算:This step is to use the above-mentioned higher-level encoding method and the recorded offset of the characters read this time to calculate the storage length occupied by the number of these recorded characters in the higher-level encoding method. For example, when the 70th character is analyzed in this loop, and it is found that the character is a UCS2 encoded character, all 70 characters must be encoded using the UCS2 encoding method, and the storage length occupied by the above characters can be calculated as follows:

存储长度=本次参与编码字符数量×当前编码方式对应的字符长度=70×16=1120位。Storage length=number of characters participating in encoding this time×character length corresponding to the current encoding mode=70×16=1120 bits.

H、判断占用的存储长度是否大于1072位,小于或等于则进入步骤I,否则进入步骤J;H, judging whether the occupied storage length is greater than 1072 bits, if less than or equal to then enter step I, otherwise enter step J;

此步骤是预先为短信分割记下可能的分割点。因为一条长短信的子短信其占用的存储长度最大为1072位,因此如果是长短信则需要在需要分割的位置设定一个分割标志。至于进行“小于或等于”的判断,是有可能分析到此处时占用的存储长度刚好是1072位,也可能比1072位小一点的位置。如果有一个字符跨越了“1072位”这个位置,这时当然不能把该字符和前面的字符一起作为一条短信发送,只能把分割点设定在前一个字符上。之所以是可能的分割点,是因为如果是非长短信,那么其存储长度最大为1120位,如果分析完已编辑短信的全部字符其占存储长度还没有超过1120位,则没有必要分割短信,因为该已编辑短信可以作为一条非长短信发送。This step is to record the possible split points for SMS split in advance. Because the storage length of a sub-message of a long message is up to 1072 bits, if it is a long message, a split flag needs to be set at the position to be split. As for the judgment of "less than or equal to", it is possible to analyze that the storage length occupied here is just 1072 bits, or it may be a bit smaller than 1072 bits. If there is a character that crosses the position of "1072", at this moment, this character and the previous character can not be sent together as a short message, and the split point can only be set on the previous character. The reason why it is a possible splitting point is that if it is a non-long text message, its storage length is at most 1120 characters. If all the characters of the edited text message are analyzed and its storage length does not exceed 1120 characters, there is no need to split the text message, because The edited text message can be sent as a non-long text message.

I、保存本次字符偏移量和编码方式,返回步骤B。I. Save the current character offset and encoding method, and return to step B.

保存本次字符偏移量和编码方式的目的在于:为完成此次分割短信做准备,这些字符和编码方式用于组建最终发送的短信。虽然前面步骤C和E已经进行过保存字符偏移量和编码方式的动作,但如果该已编辑短信是长短信,并且已读取字符所占的存储长度超过1072位、未超过1120位前,步骤C和E保存的字符偏移量会被修改,而编码方式也可能被修改,这样后续确定需分割短信时,会造成组建该子短信需要的字符偏移量和编码方式数据错误,造成分割失败,因此需要对分割点的字符偏移量和编码方式单独保存。The purpose of saving the character offset and encoding method of this time is to prepare for the completion of the split message, and these characters and encoding method are used to form the final sent message. Although the previous steps C and E have already carried out the action of saving the character offset and encoding method, if the edited text message is a long text message, and the storage length occupied by the read characters exceeds 1072 bits and does not exceed 1120 bits, The character offsets saved in steps C and E will be modified, and the encoding method may also be modified. In this way, when it is determined that the message needs to be divided, the character offset and encoding method data required to form the sub-message will be wrong, resulting in segmentation Failed, so the character offset and encoding method of the split point need to be saved separately.

J、判断短信类型标识是否为长短信,是长短信则进入步骤M,否则进入步骤K。J. Judging whether the short message type identifier is a long short message, if it is a long short message, go to step M, otherwise go to step K.

此步骤是当判断占用的存储长度大于1072位时启动,此时进行短信类型标识是否为长短信的判断,目的在于将此短信发送标记为长短信发送,使得网络或对方知道分割后的子短信是属于一条长短信的。如果短信被分割,其类型标识会由第一次分割前的非长短信改为长短信。长短信分割后的子短信只能发送1072位存储容量,因此读取的字符所占存储长度单元大于1072位即可进行分割操作。如果不是长短信,则在存储长度小于或等于1120位时继续分析字符,直至最终判定是非长短信。This step starts when judging that the occupied storage length is greater than 1072 bits. At this time, it is judged whether the short message type identification is a long short message. The purpose is to mark the sending of this short message as a long short message, so that the network or the other party knows the sub-short message after division It belongs to a long text message. If the short message is divided, its type identifier will be changed from non-long short message before the first division to long short message. The sub-message after the long message is divided can only be sent with a storage capacity of 1072 bits, so the division operation can be performed if the storage length unit of the read character is greater than 1072 bits. If it is not a long short message, then continue to analyze the characters when the storage length is less than or equal to 1120 bits, until it is finally determined whether it is a long short message.

K、判断占用的存储长度是否小于或等于1120位,是则单独保存所读取的字符偏移量与编码方式,返回步骤B,否则进入步骤L;K, judge whether the storage length of taking is less than or equal to 1120 bits, if it is then separately save the read character offset and encoding method, return to step B, otherwise enter step L;

此步骤目的在存储长度小于或等于1120位、继续分析字符时判断此已编辑短信是否存在非长短信的可能。如果此时占用的存储长度大于1120位,则已经确认是长短信。如果小于或等于1120位,并且已经读完全部字符则判断已编辑短信非长短信,并且单独保存所读取的字符偏移量与编码方式,以用于对短信进行编码。The purpose of this step is to determine whether the edited text message has the possibility of non-long text message when the storage length is less than or equal to 1120 bits and the character analysis is continued. If the storage length occupied at this time is greater than 1120 bits, it has been confirmed to be a long message. If it is less than or equal to 1120 bits and all characters have been read, it is judged that the edited text message is not a long text message, and the read character offset and encoding method are saved separately for encoding the text message.

以上的步骤F~K可以判断短信类型,以确定所述的独立短信所能发送的最大长度。The above steps F-K can judge the type of the short message, so as to determine the maximum length that the independent short message can send.

以下步骤L和M是采用所述选择的编码方式对该长度内字符进行编码,以组建一条能独立发送的短信。The following steps L and M are to use the selected encoding method to encode the characters within the length to form a short message that can be sent independently.

L、认为已编辑短信需要分割,将非长短信类型标识更改为长短信。L. It is considered that the edited text message needs to be divided, and the non-long text message type identification is changed to a long text message.

此时记录本次所读取的字符偏移量与编码方式,所述字符偏移量作为长短信截取点,所述编码方式用于对字符进行编码。At this time, the character offset and encoding method read this time are recorded, and the character offset is used as a long message interception point, and the encoding method is used to encode characters.

M、保存所能够构成一条短信的字符串、编码方式以及字符串长度(偏移量)等信息。M, save the information such as the character string that can constitute a short message, the encoding method and the length (offset) of the character string.

此步骤是短信一次分割完成,因此组建一条不超过标准长度的短信。如果还有字符没有分析完,继续以下步骤。This step is completed by splitting the text message once, so a text message not exceeding the standard length is formed. If there are still characters that have not been analyzed, continue with the following steps.

N、判断是否结束标识,是则进入步骤O;否则返回步骤A。N. Judging whether to end the identification, if yes, enter step O; otherwise, return to step A.

此时分析是否到了已编辑短信的末端,即是否分析到了结束标识,是则应该结束短信分割流程,否则返回步骤A继续下一条子短信的分割操作。下一条短信分割流程的读取字符步骤是从上次分割过程中C步骤记录的字符偏移量所指向的后一个字符开始,在再次读取到长短信一次发送的最大长度时再次截取该长度内的字符。Now analyze whether the end of the edited short message has been reached, that is, whether the end mark has been analyzed, if so, the short message segmentation process should be ended, otherwise step A is returned to continue the segmentation operation of the next sub-short message. The step of reading characters in the next SMS segmentation process starts from the last character pointed to by the character offset recorded in step C in the previous segmentation process, and intercepts the length again when the maximum length of a long SMS is read again characters within.

O、短信分割结束。O. The text message segmentation ends.

这里可以举一个例子对本发明短信处理方法进行说明,例如:短信的内容有160个字符,前153个为7位字符,后7个为Unicode字符,如果简单地统一采用UCS2编码方式,每个字符分配16位存储容量,则存储容量是:160×16=2560。对于长短信来说,每个子短信的最大容量是:1072,所以这160个字符必须分成3条子短信(67+67+26)发送。An example can be cited here to illustrate the short message processing method of the present invention, for example: the content of the short message has 160 characters, the first 153 are 7-bit characters, and the last 7 are Unicode characters. If the UCS2 encoding method is uniformly adopted simply, each character Allocate 16-bit storage capacity, then the storage capacity is: 160×16=2560. For long short messages, the maximum capacity of each sub-message is: 1072, so these 160 characters must be divided into 3 sub-messages (67+67+26) to send.

依照本发明,则不是统一采用UCS2编码方式对短信全部内容进行编码,而是首先得知该短信的短信类型为长短信;然后对短信进行分割,第一次分割点在整条短信的第1071存储位,即第153个字符。此153个字符用7位编码方式进行编码,并作为所述长短信的第一条子短信进行发送。还剩后面的7个字符用UCS2编码方式进行编码,作为所述长短信的第二条子短信进行发送,这样就可以节省一条短信的发送量。According to the present invention, instead of uniformly adopting the UCS2 encoding method to encode the entire content of the short message, first learn that the short message type of the short message is a long short message; storage bit, which is the 153rd character. The 153 characters are encoded with a 7-bit encoding method, and sent as the first sub-message of the long message. The remaining 7 characters are encoded with the UCS2 encoding method and sent as the second sub-message of the long message, so that the amount of sending a message can be saved.

从以上可以看出,在多种类型字符混合的长短信需要发送前,本发明采用对长短信内容进行分割的方式,对每条分割的短信采用较优编码方式进行编码,避免统一用一种编码方式对整条短信编码。也就是动态地分析每一个字符所属的字符类型来决定一条能够独立发送的短信所应该选取的较优编码方式、动态统计在较优编码方式下参与编码的字符数量最大值,取可以实现独立发送短信的字符数量最多的一种编码方式来组成一条可独立发送的短信,使一条独立发送的短信能够发送最大数量的信息内容。对所述长短信重复使用上述方法,则对整个短信内容而言可能就会采用多种编码方式与分割方式,避免简单地对整条短信使用单一的编码方式而导致长短信分割后发送次数多、发送的短信空间利用率低、用户费用增加的情况,从而减少长短信的分割条数和提高空间利用率。As can be seen from the above, before long short messages with mixed characters of various types need to be sent, the present invention adopts the method of segmenting the content of long short messages, and encodes each divided short message using a better coding method to avoid unified use of one The encoding method encodes the entire text message. That is to dynamically analyze the character type of each character to determine the optimal encoding method that should be selected for a short message that can be sent independently, and dynamically count the maximum number of characters participating in encoding under the optimal encoding method, which can realize independent sending A coding method with the largest number of characters in a short message is used to form a short message that can be sent independently, so that a short message that can be sent independently can send the maximum amount of information content. If the above-mentioned method is repeatedly used for the long short message, multiple encoding methods and segmentation methods may be adopted for the entire short message content, so as to avoid simply using a single encoding method for the entire short message and cause the long short message to be divided and sent many times. , The space utilization rate of the sent short message is low, and the user fee increases, thereby reducing the number of divisions of long short messages and improving the space utilization rate.

如前所述,2006年1至4月中国手机短信量达到1322.5亿条,如果其中有1%的长短信,那么就有13.225亿条,假设每条长短信发送都节省一次发送量,那么就减少13.225亿次,每条短信0.1元计,可减少信息费用1.3225亿元,大大减轻了用户费用负担,同时由于网络系统减少13.225亿次信息传送操作,较大程度降低系统负担。As mentioned earlier, from January to April 2006, the volume of text messages on mobile phones in China reached 132.25 billion. If 1% of them were long text messages, there would be 1.3225 billion. The reduction of 1.3225 billion times and 0.1 yuan per text message can reduce information costs by 132.25 million yuan, which greatly reduces the burden on users. At the same time, due to the reduction of 1.3225 billion times of information transmission operations in the network system, the system burden is greatly reduced.

参阅图4,本发明短信处理方法还提供第二实施方式,该实施方式是在已知该已编辑完成的短信是长短信类型情况下进行短信分割的处理,本实施方式类似于本发明第一实施方式,基本包括步骤:Referring to Fig. 4, the short message processing method of the present invention also provides the second embodiment, and this embodiment is to carry out the processing of short message segmentation under the known this edited short message is the long short message type situation, and this embodiment is similar to the first embodiment of the present invention The implementation mode basically includes steps:

A、初始化需发送短信的编码方式为最低等级,也就是7位编码方式,初始化短信类型为非长短信,扩展7位字符为0。A. The encoding method of the SMS to be sent for initialization is the lowest level, that is, the 7-digit encoding method. The type of the initialization SMS is non-long SMS, and the extended 7-digit character is 0.

初始化的作用在于无论短信内容为何,预先设定其编码方式为最低等级并且短信类型为非长短信,如果分割结果是全部短信内容属于最低等级编码,即不需要采用较高级编码方式,从而节省存储器空间,减少分割条数。如果最后判断的结果是长短信,那么可能进行多次分割,每次分割的流程步骤A都需要初始化需发送短信的编码方式为最低等级,但短信类型标识仅初始化一次,即如果多次分割,第一次分割的流程需要初始化,其他分割流程不需要初始化。The function of initialization is to pre-set the encoding method to be the lowest level and the type of the message to be non-long message regardless of the content of the message. If the result of segmentation is that all message contents belong to the lowest level of encoding, it is not necessary to use a higher-level encoding method, thereby saving memory. Space, reduce the number of partitions. If the result of the final judgment is a long text message, then multiple divisions may be performed, and the process step A of each division needs to initialize the encoding method of the text message to be sent to the lowest level, but the text message type identifier is only initialized once, that is, if it is divided multiple times, The process of the first split needs to be initialized, and the other split processes do not need to be initialized.

B、从需发送的短信内容中读取一个字符,判断是否字符结束标识,是则更改结束标识,直接转到步骤M,否则继续以下步骤。B. Read a character from the content of the short message to be sent, judge whether the character ends, if yes, change the end mark, and directly go to step M, otherwise continue the following steps.

从第一个字符开始分析,如果第一个字符就已经是短信结尾,说明是空短信,在后续循环分析中,如果该字符表明已经是短信结尾,则没有必要再分析,结束循环。Start the analysis from the first character. If the first character is already the end of the message, it means that it is an empty message. In the subsequent loop analysis, if the character indicates that it is the end of the message, there is no need to analyze it again and end the cycle.

C、记录本次所读取的字符偏移量。C. Record the character offset read this time.

所述的字符偏移量是指相对于第一次读取字符的位置来说的,比如第一次读取字符,那么这个字符偏移量是1,第n次读取字符,那么这个字符偏移量是n。读取的间隔是16位,因为在此之前,所有属于7位、扩展7位、8位或者UCS2编码类型的字符都转换成了Unicode字符。在后续的分割短信流程中,所述字符偏移量作为短信分割点的依据。The character offset refers to the position relative to the character read for the first time. For example, if the character is read for the first time, then the character offset is 1. If the character is read for the nth time, then the character The offset is n. The reading interval is 16 bits, because before that, all characters belonging to 7-bit, extended 7-bit, 8-bit or UCS2 encoding types are converted to Unicode characters. In the subsequent process of splitting the short message, the character offset is used as the basis for the split point of the short message.

D、判断该字符所属类型和编码方式。D. Determine the type and encoding method of the character.

即判断该字符所属的编码类型,分析其属于标准7位、扩展7位、8位或者UCS2编码字符的哪一种,进而确定7位、8位或者UCS2编码方式。That is to judge the encoding type of the character, analyze which one it belongs to, standard 7-bit, extended 7-bit, 8-bit or UCS2 encoded character, and then determine the 7-bit, 8-bit or UCS2 encoding method.

E、与此前保存的较高等级编码方式比较,如果此字符的编码方式等级更高,则将当前编码方式替换所保存的编码方式。E. Compared with the higher-level encoding method saved before, if the encoding method of this character is higher, replace the saved encoding method with the current encoding method.

此步骤的目的在于:因为每个单独发送的短信只能采用一种编码方式,而且是采用能对所有字符进行编码的编码方式来进行编码,因此这里在分析短信字符的同时,将分析到的属于较高等级的字符的编码方式记录下来,这样就能得到短信里面字符所属较高等级的编码方式,因而利于后续的短信占用存储量长度计算。The purpose of this step is: because each text message sent separately can only use one encoding method, and it is encoded by an encoding method that can encode all characters, so here, while analyzing the text message characters, the analyzed The encoding method of the character belonging to the higher level is recorded, so that the encoding method of the higher level character belonging to the character in the short message can be obtained, which is beneficial to the calculation of the storage length of the subsequent short message.

F、依据较高等级编码方式统计当前需要参与编码的字符数量。F. Count the number of characters that currently need to be encoded according to the higher-level encoding method.

此步骤的目的在于:在每次分析字符的循环中,统计出按照当前较高等级编码方式对应的已读的参与编码的字符数量,以便于用作后续判断是否长短信的参数。The purpose of this step is to count the number of characters that have been read and participate in encoding according to the current higher-level encoding mode in each cycle of character analysis, so as to be used as parameters for subsequent judgments on whether to send short messages.

G、使用最优的编码方式,与当前所统计的参与编码的字符总数,来计算这些字符数总共需要占用的存储长度。G. Use the optimal encoding method and the total number of characters involved in the encoding currently counted to calculate the total storage length required for these characters.

本步骤是利用上述较高等级编码方式和已记录的本次读取字符偏移量,算出在所述较高等级编码方式下这些已记录字符数量占据的存储长度。比如本次循环分析到第70个字符,发现该字符是UCS2编码字符,则70个字符都必须采用UCS2编码方式进行编码,那上述字符占用的存储长度可以这样计算:This step is to use the above-mentioned higher-level encoding method and the recorded offset of the characters read this time to calculate the storage length occupied by the number of these recorded characters in the higher-level encoding method. For example, when the 70th character is analyzed in this loop, and it is found that the character is a UCS2 encoded character, all 70 characters must be encoded using the UCS2 encoding method, and the storage length occupied by the above characters can be calculated as follows:

存储长度=本次读取字符偏移量×当前编码方式对应的字符长度=70×16=1120位。Storage length = offset of character read this time × character length corresponding to the current encoding mode = 70 × 16 = 1120 bits.

H、判断占用的存储长度是否大于1072位,小于或等于则进入步骤I,否则进入步骤M;H, judging whether the occupied storage length is greater than 1072 bits, if less than or equal to then enter step I, otherwise enter step M;

此步骤是预先为短信分割记下分割点。因为一条长短信的子短信其占用的存储长度最大为1072位,因此需要在需要分割的位置设定一个分割标志。至于进行“小于或等于”的判断,是有可能分析到此处时占用的存储长度刚好是1072位,也可能比1072位小一点的位置,即有一个字符跨越了“1072位”这个位置。这时当然不能把该字符和前面的字符一起作为一条短信发送,只能把分割点设定在前一个字符上。This step is to record the split point for SMS split in advance. Because the storage length of a sub-message of a long message is at most 1072 bits, a split flag needs to be set at the position to be split. As for the judgment of "less than or equal to", it is possible to analyze that the storage length occupied here is just 1072 bits, or it may be a bit smaller than 1072 bits, that is, a character spans the position of "1072 bits". At this time, of course, the character and the previous character cannot be sent together as a short message, and the split point can only be set on the previous character.

I、保存本次字符偏移量和编码方式,返回步骤B。I. Save the current character offset and encoding method, and return to step B.

保存本次字符偏移量和编码方式的目的在于:为完成此次分割短信做准备,这些字符偏移量和编码方式用于组建最终发送的短信。虽然前面步骤C和E已经进行过保存字符偏移量和编码方式的动作,但如果该已编辑短信是长短信,并且已读取字符所占的存储长度超过1072位,步骤C和E保存的字符偏移量会被修改,而编码方式也可能被修改,这样后续确定需分割短信时,会造成组建该子短信需要的字符偏移量和编码方式数据错误,造成分割失败,因此需要对分割点的字符偏移量和编码方式单独保存。The purpose of saving the current character offset and encoding method is to prepare for the completion of the split message, and these character offsets and encoding methods are used to form the final sent message. Although the previous steps C and E have already carried out the action of saving the character offset and encoding method, if the edited text message is a long text message, and the storage length of the read characters exceeds 1072 bits, the steps C and E save The character offset will be modified, and the encoding method may also be modified. In this way, when it is determined that the short message needs to be divided, the character offset and encoding method data required to form the sub-message will be wrong, resulting in segmentation failure. The character offset and encoding method of the point are saved separately.

此处省略第一实施方式中的J,K,L步骤,因为不需要进行是否长短信的判断。Steps J, K, and L in the first embodiment are omitted here, because there is no need to judge whether it is a long message or not.

M、保存所能够构成一条短信的字符串、编码方式以及字符串长度(偏移量)等信息。M, save the information such as the character string that can constitute a short message, the encoding method and the length (offset) of the character string.

此步骤是短信一次分割完成,因此组建一条不超过标准长度的短信。如果还有字符没有分析完,继续以下步骤。This step is completed by splitting the text message once, so a text message not exceeding the standard length is formed. If there are still characters that have not been analyzed, continue with the following steps.

N、判断是否结束标识,是则进入步骤O;否则返回步骤A。N. Judging whether to end the identification, if yes, enter step O; otherwise, return to step A.

此时分析是否到了已编辑短信的末端,即是否分析到了结束标识,是则应该结束短信分割流程,否则返回步骤A继续下一条子短信的分割操作。下一条短信分割流程的读取字符步骤是从上次分割过程中C步骤记录的字符偏移量所指向的后一个字符开始。Now analyze whether the end of the edited short message has been reached, that is, whether the end mark has been analyzed, if so, the short message segmentation process should be ended, otherwise step A is returned to continue the segmentation operation of the next sub-short message. The character reading step of the next short message segmentation process starts from the last character pointed to by the character offset recorded in the C step in the last segmentation process.

O、短信分割结束。O. The text message segmentation ends.

从以上可以看出,由于不需要对是否长短信进行判断,因此较本发明第一实施方式减少步骤。但同样可以实现减少长短信的分割条数、提高空间利用率、降低用户费用和降低系统负担的目的。It can be seen from the above that since there is no need to judge whether it is a long message, steps are reduced compared with the first embodiment of the present invention. However, the objectives of reducing the number of split long short messages, improving space utilization, reducing user fees and system burdens can also be achieved.

本发明第三实施方式是在判断整个短信内容当中有只能使用UCS2编码的字符,可以在本发明第一实施方式或第二实施方式中嵌入以下方法,加快分析的速度,分析步骤如下:The third embodiment of the present invention is to judge that there are characters that can only use UCS2 encoding in the middle of the whole message content, and the following methods can be embedded in the first embodiment or the second embodiment of the present invention to speed up the analysis. The analysis steps are as follows:

F’、在上述本发明第一实施方式或第二实施方式的步骤D中,一旦判断出只能使用UCS2编码的字符,直接判断整个短信的长度,类似于上述步骤F,但统计的是整个短信长度。F', in step D of the above-mentioned first embodiment or second embodiment of the present invention, once it is judged that only characters encoded by UCS2 can be used, the length of the entire short message is directly judged, similar to the above-mentioned step F, but the statistics are the entire SMS length.

K’、如果整个短信长度小于等于70,将所有的字符保存,编码方式为UCS2,保存分析信息以组建一条短信。K', if the length of the whole note is less than or equal to 70, all characters are saved, the encoding method is UCS2, and the analysis information is saved to form a note.

L’、如果整个短信长度大于70,则可以肯定整个短信内容是需要分割成多条子短信的,执行第一实施方式的步骤L,将短信类型标识更改为长短信状态。L ', if the whole short message length is greater than 70, then it can be sure that the whole short message content needs to be divided into many sub-short messages, and the step L of the first embodiment is executed, and the short message type identification is changed to the long short message state.

C’、根据上述步骤C,判断字符偏移量是否大于67。C', according to the above step C, judge whether the character offset is greater than 67.

I’、如果字符数小于等于67个,那么直接从起始位置开始往后截取67个字符,编码方式为UCS2,保存分析信息以组建一条短信。1', if the number of characters is less than or equal to 67, then directly intercept 67 characters from the starting position, the encoding method is UCS2, and the analysis information is saved to form a short message.

J’、如果大于67,将此字符前面的所有字符截取保存,编码方式采用此字符前面分析出的较高等级编码方式,保存分析信息以组建一条短信。并且还从此字符开始往后截取67个字符,不足67个字符时,截取所有剩余的字符,编码方式为UCS2,保存分析信息以组建一条短信。如果还有剩余字符,采取以上方式继续分割。J', if it is greater than 67, all characters before this character are intercepted and preserved, and the coding method adopts the higher level coding method analyzed in front of this character, and the analysis information is preserved to form a short message. And also intercept 67 characters from this character, if less than 67 characters, intercept all remaining characters, the encoding method is UCS2, save the analysis information to form a short message. If there are remaining characters, continue to split in the above way.

上述加快方法是在判断字符类型时,一旦发现只能使用UCS2编码的字符,则无需再进行后续判断字符类型的步骤,直接用此编码进行短信处理,实现与上述第一、第二实施方式一样的发明效果,同时加快处理流程。The above-mentioned method of speeding up is that when judging the character type, once a character that can only be encoded by UCS2 is found, then there is no need to perform the subsequent step of judging the character type, and directly use this code to process the short message, which is the same as the above-mentioned first and second embodiments. Inventive effects, while speeding up the process.

上述流程中的较高等级编码方式可以不是唯一的,也就是说,如果统计完在最大长度内的全部字符后,发现7位或8位编码都可以将所述全部字符编码,按照上述流程应该选择7位编码对短信字符进行编码然后发送,这里可以选择8位编码对短信字符进行编码然后发送,同样可以减少长短信的分割条数。The higher-level encoding method in the above process may not be unique, that is to say, if after counting all the characters within the maximum length, it is found that all characters can be encoded by 7-bit or 8-bit encoding, according to the above process it should be Select 7-bit encoding to encode the characters of the short message and then send it. Here, you can choose an 8-bit encoding to encode the characters of the short message and then send it, which can also reduce the number of divisions of long short messages.

参阅图5,本发明还提供一种短信处理装置,所述装置包括字符类型分析模块100、编码方式判断模块200、编码模块400、短信类型判断模块300以及长短信截取模块500。Referring to FIG. 5 , the present invention also provides a short message processing device, which includes a character type analysis module 100 , an encoding mode judgment module 200 , an encoding module 400 , a short message type judgment module 300 and a long short message interception module 500 .

所述字符类型分析模块100用于在短信发送前分析短信内字符的类型,其包括字符读取模块101和字符类型与编码判断模块102。所述字符读取模块101用于从需发送的短信中读取一个字符,所述字符类型与编码判断模块102用于判断该字符所属类型和当前编码方式。The character type analysis module 100 is used for analyzing the type of characters in the short message before sending the short message, which includes a character reading module 101 and a character type and encoding judging module 102 . The character reading module 101 is used to read a character from the short message to be sent, and the character type and encoding judging module 102 is used to judge the type of the character and the current coding method.

所述编码方式判断模块200用于在一次短信可以发送的最大长度内而且不含只能采用最高等级编码进行编码的字符类型情况下,选择能够将该长度内所有类型字符进行编码的一种等级较最高等级编码方式低的编码方式。所述编码方式判断模块200包括编码方式比较模块202和编码方式保存模块201。所述编码方式比较模块202用于比较所述当前编码方式与此前保存的编码方式,在该字符的编码方式等级高于所述保存的编码方式情况下,在编码方式保存模块201内将当前编码方式替换所保存的编码方式,并由字符读取模块101继续读取字符,直至已读取字符的存储长度达到所述最大长度。The encoding method judging module 200 is used to select a level that can encode all types of characters within the length within the maximum length that can be sent in a short message and does not contain characters that can only be encoded with the highest level of encoding An encoding method lower than the highest level encoding method. The coding mode judging module 200 includes a coding mode comparing module 202 and a coding mode saving module 201 . The encoding mode comparison module 202 is used to compare the current encoding mode with the encoding mode saved before, and when the encoding mode level of this character is higher than the encoding mode saved, the current encoding mode is saved in the encoding mode saving module 201. mode to replace the saved encoding mode, and the character reading module 101 continues to read characters until the storage length of the read characters reaches the maximum length.

所述编码模块400用于采用所述选择的编码方式对该长度内字符进行编码,以组建一条短信。The encoding module 400 is used to encode characters within the length using the selected encoding method to form a short message.

所述短信类型判断模块300包括字符数量统计模块301、存储长度计算模块302、存储长度判断模块303以及字符偏移量与编码方式单独保存模块304。所述字符数量统计模块301用于依据保存的编码方式统计当前需要参与编码的字符数量。所述存储长度计算模块302用于依据所述保存的编码方式与当前所统计的参与编码的字符总数,计算该等字符总共需要占用的存储长度。所述存储长度判断模块303用于判断占用的存储长度是否小于或等于1120位,是并且已经读完全部字符则判断短信类型为非长短信,否则判断短信类型为长短信。所述字符偏移量与编码方式单独保存模块304201用于在判断占用的存储长度小于或等于1120位、并且还剩未读取字符时单独保存所读取的字符偏移量与编码方式,所述编码方式用于在确定短信类型为非长短信时对字符进行编码。The short message type judging module 300 includes a character number counting module 301 , a storage length calculation module 302 , a storage length judging module 303 , and a character offset and encoding method separate saving module 304 . The character quantity counting module 301 is used for counting the number of characters currently required to be encoded according to the saved encoding method. The storage length calculation module 302 is used to calculate the total storage length that these characters need to occupy according to the saved encoding method and the currently counted total number of characters involved in encoding. The storage length judging module 303 is used to judge whether the occupied storage length is less than or equal to 1120 bits, if yes and all characters have been read, then it is judged that the short message type is a non-long short message, otherwise it is judged that the short message type is a long short short message. The character offset and encoding mode separate storage module 304201 is used to separately save the read character offset and encoding mode when judging that the occupied storage length is less than or equal to 1120 bits and there are still unread characters. The above encoding method is used to encode characters when it is determined that the type of the short message is a non-long short message.

所述长短信截取模块500在所述占据的存储长度大于1120位、并且判断短信类型为长短信时,用于根据记录的本次所读取的字符偏移量与编码方式、分别设定所述长短信截取点对字符进行编码。The long short message interception module 500 is used to set the character offset and encoding method according to the read character offset and encoding method of the record when the occupied storage length is greater than 1120 bits and it is judged that the short message type is a long short message. The above-mentioned long short message interception point encodes characters.

本发明短信处理装置的基本运作流程是:字符类型分析模块100首先启动,在发送前对已编辑的短信进行读取字符操作。每读取一个字符都伴随后面的分析动作直至循环结束。The basic operation flow of the short message processing device of the present invention is: the character type analysis module 100 is first started, and the edited short message is read before sending. Every time a character is read, it will be accompanied by subsequent analysis actions until the end of the loop.

具体是:所述字符读取模块101从需发送的短信中读取一个字符并记录字符偏移量。所述编码判断模块判断该字符所属类型和当前编码方式,由编码方式比较模块202比较所述当前编码方式与此前保存的编码方式,在该字符的编码方式等级高于所述保存的编码方式情况下,在编码方式保存模块201内将当前编码方式替换所保存的编码方式。所述字符数量统计模块301依据保存的编码方式统计当前需要参与编码的字符数量,由存储长度计算模块302依据所述保存的编码方式与当前所统计的参与编码的字符总数,计算该等字符总共需要占用的存储长度。然后由存储长度判断模块303判断占用的存储长度是否小于或等于1120位,是并且已经读完全部字符则判断短信类型为非长短信,否则判断短信类型为长短信。Specifically: the character reading module 101 reads a character from the short message to be sent and records the character offset. The encoding judging module judges the type of the character and the current encoding method, and compares the current encoding method with the previously saved encoding method by the encoding method comparison module 202. If the encoding method level of the character is higher than the saved encoding method Next, in the encoding mode saving module 201, the current encoding mode is replaced with the saved encoding mode. The number of characters counting module 301 counts the number of characters that currently need to participate in encoding according to the saved encoding method, and the storage length calculation module 302 calculates the total number of characters participating in encoding based on the saved encoding method and the current statistics. The storage length to be occupied. Then judge whether the memory length of taking is less than or equal to 1120 by the storage length judging module 303, if yes and have read all characters, then judge that the short message type is a non-long short message, otherwise judge that the short message type is a long short message.

然后,字符偏移量与编码方式单独保存模块304在判断占用的存储长度小于或等于1120位、并且还剩未读取字符时单独保存所读取的字符偏移量与编码方式,用于在确定短信类型为非长短信时对字符进行编码。如果存储长度大于1120位则由长短信截取模块500根据记录的本次所读取的字符偏移量与编码方式、分别设定所述长短信截取点对字符进行编码。直至分析完整条已编辑的短信。Then, the character offset and the encoding mode are separately saved by the module 304 when judging that the occupied storage length is less than or equal to 1120 bits, and when there are unread characters left, the read character offset and the encoding mode are separately stored for use in Encode characters when the SMS type is determined to be a non-long SMS. If the storage length is greater than 1120 bits, then the long short message interception module 500 sets respectively the long short message interception point and encodes the character according to the character offset and encoding mode read this time of the record. Until the entire edited text message is analyzed.

其中,在字符类型分析模块100判断该字符所属类型为UCS2编码时,由字符偏移量记录模块记录整个短信的长度,在整个短信长度大于70情况下,所述短信类型判断模块300判断得到该短信的短信类型是长短信,并在字符偏移量记录模块记录的字符偏移量大于67情况下,由长短信截取模块500将此字符前面的67个字符截取,并由编码模块400进行编码。Wherein, when the character type analysis module 100 judges that the type of the character belongs to UCS2 encoding, the length of the whole short message is recorded by the character offset recording module, and when the length of the whole short message is greater than 70, the short message type judgment module 300 judges to obtain the The short message type of short message is a long short message, and when the character offset recorded by the character offset recording module is greater than 67 situations, the 67 characters in front of this character are intercepted by the long short message interception module 500, and encoded by the encoding module 400 .

从以上可以看出,在多种类型字符混合的长短信需要发送前,本发明采用编码方式判断模块和短信类型判断模块判断出能对单条分割的子短信字符全部编码的较优编码方式,避免统一用一种编码方式对整条短信编码。也就是采用字符类型分析模块动态地分析每一个字符所属的字符类型,并配合编码方式判断模块来决定一条能够独立发送的短信所应该选取的较优编码方式、动态统计在较优编码方式下参与编码的字符数量最大值,取可以实现独立发送短信的字符数量最多的一种编码方式来组成一条可独立发送的短信,使一条独立发送的短信能够发送最大数量的信息内容。避免简单地对整条短信使用单一的编码方式而导致长短信分割后发送次数多、发送的短信空间利用率低、用户费用增加的情况,从而减少长短信的分割条数、提高空间利用率、降低用户费用负担以及降低系统负担。As can be seen from the above, before the mixed long short message of multiple types of characters needs to be sent, the present invention adopts the encoding mode judging module and the short message type judging module to judge the optimal coding mode that can be completely encoded to the sub-message characters of the single division, to avoid Use one encoding method to encode the entire text message uniformly. That is to use the character type analysis module to dynamically analyze the character type of each character, and cooperate with the encoding method judgment module to determine the optimal encoding method that should be selected for a short message that can be sent independently, and the dynamic statistics are involved in the optimal encoding method. The maximum number of encoded characters is to use the encoding method with the largest number of characters that can be sent independently to form an independently sent short message, so that an independently sent short message can send the maximum amount of information content. Avoid simply using a single encoding method for the entire text message, resulting in many times of sending long text messages after splitting, low space utilization of sent short messages, and increased user fees, thereby reducing the number of split long text messages and improving space utilization. Reduce user cost burden and reduce system burden.

以上对本发明所提供的一种短信处理方法以及装置进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。A short message processing method and device provided by the present invention have been introduced in detail above. In this paper, specific examples are used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only used to help understand the method of the present invention. and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and scope of application. limits.

Claims (13)

Described character types analysis module comprises character read module and character types and coding judge module, described coded system judge module comprises coded system comparison module and coded system preservation module, described character read module is used for reading character one by one from the note that need send, described character types are used to judge affiliated type of this character and present encoding mode with the coding judge module, described coded system comparison module be used for more described current character coded system with can just be to reading the coded system of alphabet coding, under the higher situation of current character coded system grade, preserve the current character coded system, and continue to read character by the character read module, reach maximum length until the memory length that reads character;
10. short message processing device according to claim 9, it is characterized in that, further comprise the short message type judge module, it comprises the character quantity statistical module, memory length computing module and memory length judge module, described character quantity statistical module is used for participating in according to the current needs of preserving of coded system statistics the character quantity of coding, described memory length computing module is used for the character sum according to the coded system of described preservation and the current participation coding of adding up, calculate these characters and need the memory length that takies altogether, described memory length judge module is used to judge whether the memory length that takies is less than or equal to 1120, be and run through alphabet and judge that then short message type is non-long SMS, otherwise judge that short message type is a long SMS.
CN2006100914427A2006-06-162006-06-16 Short message processing method and deviceExpired - Fee RelatedCN101047733B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN2006100914427ACN101047733B (en)2006-06-162006-06-16 Short message processing method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN2006100914427ACN101047733B (en)2006-06-162006-06-16 Short message processing method and device

Publications (2)

Publication NumberPublication Date
CN101047733A CN101047733A (en)2007-10-03
CN101047733Btrue CN101047733B (en)2010-09-29

Family

ID=38771939

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN2006100914427AExpired - Fee RelatedCN101047733B (en)2006-06-162006-06-16 Short message processing method and device

Country Status (1)

CountryLink
CN (1)CN101047733B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101470708B (en)*2007-12-262011-08-31康佳集团股份有限公司Code compression method and apparatus
CN101309483B (en)*2008-05-292011-07-06华为终端有限公司Short message encoding and decoding method and terminal
CN101345952B (en)*2008-09-032012-04-04华为终端有限公司Data storage and reading method, device and system for customer identity identification card
CN106231323B (en)2010-04-132020-07-28Ge视频压缩有限责任公司 Decoder, decoding method, encoder, and encoding method
KR102669292B1 (en)2010-04-132024-05-28지이 비디오 컴프레션, 엘엘씨Sample region merging
CN106067983B (en)2010-04-132019-07-12Ge视频压缩有限责任公司The method of decoding data stream, the method and decoder for generating data flow
KR101793857B1 (en)2010-04-132017-11-03지이 비디오 컴프레션, 엘엘씨Inheritance in sample array multitree subdivision
CN101938719A (en)*2010-09-032011-01-05中兴通讯股份有限公司Method for coding and decoding short messages (SMS), device and terminal
CN103327465A (en)*2012-03-222013-09-25吴平Method and device for converting voice message into short text
CN102665184B (en)*2012-04-242014-12-10中兴通讯股份有限公司Coding method and device for increasing short message utilization rate
CN105472107A (en)*2014-08-262016-04-06中兴通讯股份有限公司Terminal information processing method and device
CN105848116A (en)*2015-01-152016-08-10中兴通讯股份有限公司Short message transmission method, device and equipment
CN105634674A (en)*2016-01-122016-06-01青岛海信移动通信技术股份有限公司Short message processing method and device
CN111507068B (en)*2016-10-272023-08-25青岛海信移动通信技术有限公司Input information processing method and device and mobile terminal

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1444378A (en)*2002-03-112003-09-24联想(北京)有限公司Ultra-long short message sending method
CN1484419A (en)*2002-08-142004-03-24�ձ�������ʽ����Selection of transmission alphabet sets for short message service

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1444378A (en)*2002-03-112003-09-24联想(北京)有限公司Ultra-long short message sending method
CN1484419A (en)*2002-08-142004-03-24�ձ�������ʽ����Selection of transmission alphabet sets for short message service

Also Published As

Publication numberPublication date
CN101047733A (en)2007-10-03

Similar Documents

PublicationPublication DateTitle
CN101047733B (en) Short message processing method and device
CN1310543C (en) Data sending and receiving device and method of digital mobile station
EP2509344A1 (en)Method for transmitting and receiving multimedia information and terminal thereof
US8055085B2 (en)Blocking for combinatorial coding/decoding for electrical computers and digital data processing systems
US7990289B2 (en)Combinatorial coding/decoding for electrical computers and digital data processing systems
CN101350858B (en)Method for decoding short message and user terminal
CN1371232A (en)Icon and cartoon managing method and data structure and command executing mobile terminal
CN101674552A (en)Short message coding method and terminal
CN110865970B (en) A Compressed Traffic Pattern Matching Engine and Pattern Matching Method Based on FPGA Platform
JP2014526098A (en) Method and system for downloading font files
CN101667843A (en)Methods and devices for compressing and uncompressing data of embedded system
CN116346289A (en) A data processing method for computer network center
CN101840394A (en) Data decoding method
CN104360988A (en)Method and device for identifying coding mode of Chinese characters
CN1444378B (en)Ultra-long short message sending method
CN114979094B (en)RTP-based data transmission method, device, equipment and medium
CN101621771B (en)Method, device and system for processing short message encoding
CN118075701B (en)Implementation method for splitting and recombining long short messages for processing multiple coding formats
CN112383888A (en)Short message system, method and equipment
CN1264375C (en)Double modular card and access terminal and method for read of different network short message
CN108632088A (en)Method for processing business, device and server
CN111352932B (en) Method and device for improving data processing efficiency based on bitmap tree algorithm
CN105634674A (en)Short message processing method and device
CN112506876B (en)Lossless compression query method supporting SQL query
CN101352015A (en)Transmission of handwriting over SMS protocol

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C14Grant of patent or utility model
GR01Patent grant
C17Cessation of patent right
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20100929

Termination date:20120616


[8]ページ先頭

©2009-2025 Movatter.jp