技术领域Technical Field
本公开内容关于一种分段方法、分段系统及非暂态电脑可读取媒体,且特别是有关于一种针对字幕的分段方法、分段系统及非暂态电脑可读取媒体。The present disclosure relates to a segmentation method, a segmentation system, and a non-transitory computer-readable medium, and more particularly to a segmentation method, a segmentation system, and a non-transitory computer-readable medium for subtitles.
背景技术Background technique
线上学习平台是指一种将众多学习资料存储于伺服器中,让使用者能通过网际网络连线至伺服器,以随时浏览学习资料的网络服务。在现行的各类线上学习平台中,提供的学习资料类型包含影片、音频、简报、文件或论坛。An online learning platform is a network service that stores a large amount of learning materials in a server, allowing users to connect to the server through the Internet and browse the learning materials at any time. Among the various types of online learning platforms currently available, the types of learning materials provided include videos, audios, presentations, documents or forums.
由于线上学习平台中存储的学习资料数量庞大,为了能够方便使用者的使用,需要针对学习资料的文字进行自动分段以及建立段落关键字。因此,如何根据学习影片的内容之间的差异性进行处理,达到将学习影片中类似的主题进行分段并标注关键字的功能是本领域待解决的问题。Since there are a large number of learning materials stored in online learning platforms, in order to facilitate users, it is necessary to automatically segment the text of the learning materials and establish paragraph keywords. Therefore, how to process the differences between the content of learning videos and achieve the function of segmenting similar topics in learning videos and marking keywords is a problem to be solved in this field.
发明内容Summary of the invention
本公开内容的第一方面在于提供一种分段方法。该分段方法包含下列步骤:接收字幕信息;其中,字幕信息包含多个字幕句;根据设定值选取字幕句,并将被选取的字幕句分为第一段落;针对第一字幕句进行常见分段词汇判断;其中,第一字幕句是字幕句的其中之一;以及根据常见分段词汇判断的判断结果产生第二段落或将第一字幕句并入第一段落。The first aspect of the present disclosure is to provide a segmentation method. The segmentation method comprises the following steps: receiving subtitle information; wherein the subtitle information comprises a plurality of subtitle sentences; selecting a subtitle sentence according to a set value, and dividing the selected subtitle sentence into a first paragraph; performing common segmentation vocabulary judgment on the first subtitle sentence; wherein the first subtitle sentence is one of the subtitle sentences; and generating a second paragraph or merging the first subtitle sentence into the first paragraph according to the judgment result of the common segmentation vocabulary judgment.
本公开内容的第二方面在于提供一种分段系统,其包含存储单元以及处理器。存储单元用以存储字幕信息、分段结果、第一段落对应的注解以及第二段落对应的注解。处理器与存储单元电性连接,用以接收字幕信息;其中,字幕信息包含多个字幕句,处理器包含:分段单元、常见词探测单元、以及段落产生单元。分段单元用以利用设定值根据特定顺序选取字幕句,并将被选取的字幕句分为第一段落。常见词探测单元与分段单元电性连接,用以针对第一字幕句进行常见分段词汇判断;其中,第一字幕句是所述多个字幕句的其中之一。段落产生单元与常见词探测单元电性连接,用以根据常见分段词汇判断的判断结果产生第二段落或将第一字幕句并入第一段落。The second aspect of the present disclosure is to provide a segmentation system, which includes a storage unit and a processor. The storage unit is used to store subtitle information, segmentation results, annotations corresponding to the first paragraph, and annotations corresponding to the second paragraph. The processor is electrically connected to the storage unit to receive subtitle information; wherein the subtitle information includes multiple subtitle sentences, and the processor includes: a segmentation unit, a common word detection unit, and a paragraph generation unit. The segmentation unit is used to select subtitle sentences according to a specific order using a set value, and divide the selected subtitle sentences into a first paragraph. The common word detection unit is electrically connected to the segmentation unit to perform common segmentation vocabulary judgment on the first subtitle sentence; wherein the first subtitle sentence is one of the multiple subtitle sentences. The paragraph generation unit is electrically connected to the common word detection unit to generate a second paragraph or merge the first subtitle sentence into the first paragraph according to the judgment result of the common segmentation vocabulary judgment.
本申请的第三方面在于提供一种非暂态电脑可读取媒体,包含至少一指令程序,由处理器执行至少一指令程序以实行分段方法,其包含以下步骤:接收字幕信息;其中,字幕信息包含多个字幕句;根据设定值选取字幕句,并将被选取的字幕句分为第一段落;针对第一字幕句进行常见分段词汇判断;其中,第一字幕句是字幕句的其中之一;以及根据常见分段词汇判断的判断结果产生第二段落或将第一字幕句并入第一段落。The third aspect of the present application is to provide a non-transitory computer-readable medium, comprising at least one instruction program, and a processor executes the at least one instruction program to implement a segmentation method, which includes the following steps: receiving subtitle information; wherein the subtitle information includes multiple subtitle sentences; selecting a subtitle sentence according to a set value, and dividing the selected subtitle sentence into a first paragraph; performing common segmentation vocabulary judgment on the first subtitle sentence; wherein the first subtitle sentence is one of the subtitle sentences; and generating a second paragraph or merging the first subtitle sentence into the first paragraph according to the judgment result of the common segmentation vocabulary judgment.
本公开的分段方法、分段系统及非暂态电脑可读取媒体,其主要为改进以往利用人工方式进行影片段落标记,耗费大量人力以及时间的问题。先计算每一字幕句对应的关键字,在针对字幕句进行常见分段词汇判断,根据该常见分段词汇判断的判断结果产生第二段落或将第一字幕句并入第一段落,以产生分段结果,达到将学习影片中类似的主题进行分段并标注关键字的功能。The segmentation method, segmentation system and non-transient computer-readable medium disclosed in the present invention are mainly used to improve the problem of manual marking of video segments in the past, which consumes a lot of manpower and time. First, the keywords corresponding to each subtitle sentence are calculated, and then common segmentation vocabulary judgment is performed on the subtitle sentence. According to the judgment result of the common segmentation vocabulary judgment, a second paragraph is generated or the first subtitle sentence is merged into the first paragraph to generate a segmentation result, so as to achieve the function of segmenting similar topics in the learning video and marking keywords.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为让本申请内容的上述和其他目的、特征、优点与实施例能更明显易懂,对本申请的附图说明如下:In order to make the above and other purposes, features, advantages and embodiments of the present application more clearly understood, the accompanying drawings of the present application are described as follows:
图1为根据本申请的一些实施例所绘示的分段系统的示意图;FIG1 is a schematic diagram of a segmentation system according to some embodiments of the present application;
图2为根据本申请的一些实施例所绘示的分段方法的流程图;FIG2 is a flow chart of a segmentation method according to some embodiments of the present application;
图3为根据本申请的一些实施例所绘示的步骤S240的流程图;FIG. 3 is a flow chart of step S240 according to some embodiments of the present application;
图4为根据本申请的一些实施例所绘示的步骤S241的流程图;以及FIG. 4 is a flow chart of step S241 according to some embodiments of the present application; and
图5为根据本申请的一些实施例所绘示的步骤S242的流程图。FIG. 5 is a flow chart of step S242 according to some embodiments of the present application.
【附图标记说明】[Description of Reference Numerals]
100:分段系统100: Segmented system
110:存储单元110: Storage unit
130:处理器130: Processor
DB1:常见分段词汇数据库DB1: Common Segment Vocabulary Database
DB2:课程数据库DB2: Course database
131:关键字提取单元131: Keyword extraction unit
132:分段单元132: Segmentation Unit
133:常见词探测单元133: Common Word Detection Unit
134:段落产生单元134: Paragraph generation unit
135:注解产生单元135: Annotation generation unit
200:分段方法200: Segmentation method
S210~S250、S241~S242、S2411~S2413、S2421~S2423:步骤S210~S250, S241~S242, S2411~S2413, S2421~S2423: Steps
具体实施方式Detailed ways
以下将参照图示披露本申请的多个实施方式,为明确说明起见,许多实务上的细节将在以下叙述中一并说明。然而,应了解到,这些实务上的细节不应用以限制本申请。也就是说,在本公开内容部分实施方式中,这些实务上的细节是非必要的。此外,为简化图示起见,一些公知惯用的结构与元件在图示中将以简单示意的方式被绘示。The following will disclose multiple embodiments of the present application with reference to the drawings. For the purpose of clear description, many practical details will be described together in the following description. However, it should be understood that these practical details should not be used to limit the present application. In other words, in some embodiments of the present disclosure, these practical details are not necessary. In addition, for the purpose of simplifying the illustration, some well-known and commonly used structures and elements will be depicted in a simple schematic manner in the illustration.
于本文中,当一元件被称为“连接”或“耦接”时,可指“电性连接”或“电性耦接”。“连接”或“耦接”亦可用以表示两个或多个元件间相互搭配操作或互动。此外,虽然本文中使用“第一”、“第二”、…等用语描述不同元件,该用语仅是用以区别以相同技术用语描述的元件或操作。除非上下文清楚指明,否则该用语并非特别指称或暗示次序或顺位,亦非用以限定本发明。In this document, when an element is referred to as being "connected" or "coupled", it may refer to being "electrically connected" or "electrically coupled". "Connected" or "coupled" may also be used to indicate that two or more elements cooperate or interact with each other. In addition, although the terms "first", "second", etc. are used in this document to describe different elements, the terms are only used to distinguish between elements or operations described by the same technical terms. Unless the context clearly indicates otherwise, the terms do not specifically refer to or imply an order or sequence, nor are they used to limit the present invention.
请参阅图1。图1为根据本申请的一些实施例所绘示的分段系统100的示意图。如图1所绘示,分段系统100包含存储单元110以及处理器130。存储单元110电性连接至处理器130,存储单元110用以存储字幕信息、分段结果、常见分段词汇数据库DB1、课程数据库DB2、第一段落对应的注解以及第二段落对应的注解。Please refer to FIG. 1. FIG. 1 is a schematic diagram of a segmentation system 100 according to some embodiments of the present application. As shown in FIG. 1, the segmentation system 100 includes a storage unit 110 and a processor 130. The storage unit 110 is electrically connected to the processor 130, and the storage unit 110 is used to store subtitle information, segmentation results, a common segmentation vocabulary database DB1, a course database DB2, annotations corresponding to the first paragraph, and annotations corresponding to the second paragraph.
承上述,处理器130包含关键字提取单元131、分段单元132、常见词探测单元133、段落产生单元134以及注解产生单元135。分段单元132与关键字提取单元131以及常见词探测单元133电性连接,段落产生单元134与常见词探测单元133以及注解产生单元135电性连接,常见词探测单元133与注解产生单元135电性连接。As described above, the processor 130 includes a keyword extraction unit 131, a segmentation unit 132, a common word detection unit 133, a paragraph generation unit 134, and an annotation generation unit 135. The segmentation unit 132 is electrically connected to the keyword extraction unit 131 and the common word detection unit 133, the paragraph generation unit 134 is electrically connected to the common word detection unit 133 and the annotation generation unit 135, and the common word detection unit 133 is electrically connected to the annotation generation unit 135.
于本发明各实施例中,存储装置110可以实施为存储装置、硬盘、随身盘、存储卡等。处理器130可以实施为集成电路如微控制单元(microcontroller)、微处理器(microprocessor)、数字信号处理器(digital signal processor)、特殊应用集成电路(application specific integrated circuit,ASIC)、逻辑电路或其他类似元件或上述元件的组合。In various embodiments of the present invention, the storage device 110 may be implemented as a storage device, a hard disk, a flash drive, a memory card, etc. The processor 130 may be implemented as an integrated circuit such as a microcontroller, a microprocessor, a digital signal processor, an application specific integrated circuit (ASIC), a logic circuit, or other similar components or a combination of the above components.
请参阅图2。图2为根据本申请的一些实施例所绘示的分段方法200的流程图。于一实施例中,图2所示的分段方法200可以应用于图1的分段系统100上,处理器130用以根据下列分段方法200所描述的步骤,针对字幕信息进行分段以产生一分段结果以及每一段落对应的注解。如图2所示,分段方法200首先执行步骤S210接收字幕信息。于一实施例中,字幕信息包含多个字幕句。举例而言,字幕信息为影片的字幕文件,影片的字幕文件已经根据影片拨放时间将影片内容分为多个字幕句,字幕句也会根据影片播放时间排序。Please refer to Figure 2. Figure 2 is a flow chart of a segmentation method 200 illustrated according to some embodiments of the present application. In one embodiment, the segmentation method 200 shown in Figure 2 can be applied to the segmentation system 100 of Figure 1, and the processor 130 is used to segment the subtitle information according to the steps described in the following segmentation method 200 to generate a segmentation result and annotations corresponding to each paragraph. As shown in Figure 2, the segmentation method 200 first executes step S210 to receive subtitle information. In one embodiment, the subtitle information includes multiple subtitle sentences. For example, the subtitle information is a subtitle file of a movie, and the subtitle file of the movie has divided the movie content into multiple subtitle sentences according to the movie playback time, and the subtitle sentences will also be sorted according to the movie playback time.
接着,分段方法200执行步骤S220根据设定值选取字幕句,并将被选取的字幕句分为当前段落。于一实施例中,设定值可以是任意的正整数,在此设定值以3为例,因此在此步骤中会根据影片播放的时间选择3句字幕句组成当前段落。举例而言,如果总共有N句字幕句,可以选择第1字幕句~第3字幕句组成当前段落。Next, the segmentation method 200 executes step S220 to select subtitles according to the set value, and divide the selected subtitles into the current paragraph. In one embodiment, the set value can be any positive integer, and the set value is 3 as an example. Therefore, in this step, 3 subtitles are selected according to the video playback time to form the current paragraph. For example, if there are N subtitles in total, the first to third subtitles can be selected to form the current paragraph.
接着,分段方法200执行步骤S230针对当前字幕句进行常见分段词汇判断。于一实施例中,常见分段词汇被存储于常见分段词汇数据库DB1,常见词探测单元133会探测是否出现常见分段词汇。常见的分段词汇可以分为常见开头词汇以及常见结尾词汇。举例而言,常见开头词汇可以为“接下来”、“开始说明”等,常见结尾词汇可以为“以上说明到此”、“今天到这里告一段落”等。在此步骤中,会探测是否出现常见分段词汇以及出现的常见分段词汇类型(常见开头词汇或常见结尾词汇)。Next, the segmentation method 200 executes step S230 to perform common segmentation vocabulary judgment on the current subtitle sentence. In one embodiment, the common segmentation vocabulary is stored in the common segmentation vocabulary database DB1, and the common word detection unit 133 detects whether the common segmentation vocabulary appears. Common segmentation vocabulary can be divided into common beginning vocabulary and common ending vocabulary. For example, common beginning vocabulary can be "next", "begin explanation", etc., and common ending vocabulary can be "the above explanation ends here", "today ends here", etc. In this step, it is detected whether common segmentation vocabulary appears and the type of common segmentation vocabulary that appears (common beginning vocabulary or common ending vocabulary).
接着,分段方法200执行步骤S240根据常见分段词汇判断的判断结果产生下一段落或将当前字幕句并入当前段落。于一实施例中,根据前述常见词探测单元133的探测结果,可以决定是要产生新的段落或是将当前执行字幕句并入当前段落。举例而言,当前段落是由第1字幕句~第3字幕句组成,当前执行字幕句可以是第4字幕句,根据判断结果可以将第4字幕句并入当前段落或是将第4字幕句作为新的段落的开始。Next, the segmentation method 200 executes step S240 to generate the next paragraph or merge the current subtitle sentence into the current paragraph according to the judgment result of the common segmentation vocabulary judgment. In one embodiment, according to the detection result of the common word detection unit 133, it can be determined whether to generate a new paragraph or merge the current subtitle sentence into the current paragraph. For example, the current paragraph is composed of the first to third subtitle sentences, and the current subtitle sentence can be the fourth subtitle sentence. According to the judgment result, the fourth subtitle sentence can be merged into the current paragraph or the fourth subtitle sentence can be used as the beginning of a new paragraph.
承上述,步骤S240执行将当前字幕句并入当前段落后,会接着执行下一字幕句的常见分段词汇判断,因此会重行执行步骤S230的判断。举例而言,如果第4字幕句并入当前段落后,会接着执行第5字幕句的常见分段词汇判断。如果步骤S240执行产生下一段落后,会接着执行利用设定值根据特定顺序选取字幕句,将被选取的字幕句分为下一段落,因此会重行执行步骤S220的操作。举例而言,如果第4字幕句被分类为下一段落后,会重新选择第5字幕句、第6字幕句以及第7字幕句加入下一段落。因此,会重复执行分段的动作,直到字幕句被分段完毕,最后产生分段结果。As mentioned above, after step S240 is executed to merge the current subtitle sentence into the current paragraph, the common segmentation vocabulary judgment of the next subtitle sentence will be executed, so the judgment of step S230 will be re-executed. For example, if the 4th subtitle sentence is incorporated into the current paragraph, the common segmentation vocabulary judgment of the 5th subtitle sentence will be executed. If the next paragraph is generated after step S240 is executed, the subtitle sentence will be selected according to a specific order using the set value, and the selected subtitle sentence will be divided into the next paragraph, so the operation of step S220 will be re-executed. For example, if the 4th subtitle sentence is classified as the next paragraph, the 5th subtitle sentence, the 6th subtitle sentence and the 7th subtitle sentence will be re-selected to join the next paragraph. Therefore, the segmentation action will be repeated until the subtitle sentence is segmented, and finally the segmentation result is generated.
接着,步骤S240还包含步骤S241~S242,请一并参考图3,图3为根据本申请的一些实施例所绘示的步骤S240的流程图。如图3所示,分段方法200进一步执行步骤S241如果当前字幕句与常见分段词汇相关联,进行分段处理产生下一段落,并利用设定值根据特定顺序选取字幕句,将被选取的字幕句加入下一段落。其中,步骤S241还包含步骤S2411~S2413,请进一步参考图4,图4为根据本申请的一些实施例所绘示的步骤S241的流程图。如图4所示,分段方法200进一步执行步骤S2411根据判断结果决定当前字幕句是否与开头分段词汇以及结尾分段词汇的其中之一相关联。接续上方实施例,根据步骤S230的判断结果,可以决定当前字幕句是否与开头分段词汇或结尾分段词汇相关联。Next, step S240 also includes steps S241 to S242. Please refer to FIG. 3. FIG. 3 is a flowchart of step S240 according to some embodiments of the present application. As shown in FIG. 3, the segmentation method 200 further executes step S241. If the current subtitle sentence is associated with a common segmentation vocabulary, segmentation processing is performed to generate the next paragraph, and the subtitle sentence is selected according to a specific order using a set value, and the selected subtitle sentence is added to the next paragraph. Among them, step S241 also includes steps S2411 to S2413. Please further refer to FIG. 4. FIG. 4 is a flowchart of step S241 according to some embodiments of the present application. As shown in FIG. 4, the segmentation method 200 further executes step S2411 to determine whether the current subtitle sentence is associated with one of the beginning segmentation vocabulary and the ending segmentation vocabulary according to the judgment result. Continuing with the above embodiment, according to the judgment result of step S230, it can be determined whether the current subtitle sentence is associated with the beginning segmentation vocabulary or the ending segmentation vocabulary.
承上述,分段方法200进一步执行步骤S2412,如果当前字幕句与开头分段词汇相关联,以当前字幕句作为下一段落的起始句。举例而言,如果在前述的判断结果中探测到第4字幕句中具有“接下来”的词汇,即将第4字幕作为下一段落的起始句。Based on the above, the segmentation method 200 further performs step S2412, if the current subtitle sentence is associated with the beginning segmentation word, the current subtitle sentence is used as the starting sentence of the next paragraph. For example, if the word "next" is detected in the fourth subtitle sentence in the above judgment result, the fourth subtitle is used as the starting sentence of the next paragraph.
承上述,分段方法200进一步执行步骤S2413,如果当前字幕句与结尾分段词汇相关联,以当前字幕句作为当前段落的结尾句。举例而言,如果在前述的判断结果中探测到第4字幕句中具有“以上说明到此”的词汇,即将第4字幕作为当前段落的结尾句。执行完步骤S241的操作后会接着执行利用设定值根据特定顺序选取字幕句,将被选取的字幕句分为下一段落,因此会重行执行步骤S220的操作,在此不再赘述。Based on the above, the segmentation method 200 further executes step S2413. If the current subtitle sentence is associated with the ending segmentation word, the current subtitle sentence is used as the ending sentence of the current paragraph. For example, if the word "the above description ends here" is detected in the fourth subtitle sentence in the above judgment result, the fourth subtitle is used as the ending sentence of the current paragraph. After executing the operation of step S241, the subtitle sentence will be selected according to the specific order using the set value, and the selected subtitle sentence will be divided into the next paragraph, so the operation of step S220 will be re-executed, which will not be repeated here.
接着,分段方法200进一步执行步骤S242,如果当前字幕句不与常见分段词汇相关联,当前字幕句与当前段落进行相似值计算,如果相似,将第一字幕句并入当前段落。其中,步骤S242还包含步骤S2421~S2423,请进一步参考图5,图5为根据本申请的一些实施例所绘示的步骤S242的流程图。如图5所示,分段方法200进一步执行步骤S2421比较当前字幕句对应的至少一特征与当前段落对应的至少一特征的差异值是否大于门槛值。Next, the segmentation method 200 further executes step S242. If the current subtitle sentence is not associated with the common segmentation vocabulary, the current subtitle sentence and the current paragraph are calculated for similarity. If they are similar, the first subtitle sentence is incorporated into the current paragraph. Wherein, step S242 also includes steps S2421 to S2423. Please further refer to FIG. 5, which is a flowchart of step S242 according to some embodiments of the present application. As shown in FIG. 5, the segmentation method 200 further executes step S2421 to compare whether the difference value of at least one feature corresponding to the current subtitle sentence and at least one feature corresponding to the current paragraph is greater than a threshold value.
承上述,于一实施例中,从字幕句中提取出多个关键字,提取出的关键字即为当前字幕句对应的至少一特征。利用TF-IDF统计方法(Term Frequency–Inverse DocumentFrequency)计算字幕句对应的关键字。TF-IDF统计方法用来评估一字词对于数据库中的一份文件的重要程度,字词的重要性随着它在文件中出现的次数成正比增加,但同时也会随着它在数据库中出现的频率成反比下降。于此实施例中,TF-IDF统计方法可以计算当前字幕句的关键字。接着,计算当前字幕句的至少一特征(关键字)与当前段落的至少一特征(关键字)的相似值,计算出的相似值越高即可判定为当前字幕句与当前段落的内容越接近。Based on the above, in one embodiment, multiple keywords are extracted from the subtitle sentence, and the extracted keywords are at least one feature corresponding to the current subtitle sentence. The TF-IDF statistical method (Term Frequency–Inverse Document Frequency) is used to calculate the keywords corresponding to the subtitle sentence. The TF-IDF statistical method is used to evaluate the importance of a word to a document in a database. The importance of a word increases in direct proportion to the number of times it appears in the document, but it also decreases inversely with the frequency of its appearance in the database. In this embodiment, the TF-IDF statistical method can calculate the keywords of the current subtitle sentence. Next, the similarity value of at least one feature (keyword) of the current subtitle sentence and at least one feature (keyword) of the current paragraph is calculated. The higher the calculated similarity value, the closer the current subtitle sentence is to the content of the current paragraph.
承上述,分段方法200进一步执行步骤S2422,如果差异值小于门槛值,将当前字幕句并入当前段落。于一实施例中,利用门槛值对相似值进行筛选,当相似值不小于门槛值时,表示当前字幕句与当前段落的内容比较相似,因此可以将当前字幕句并入当前段落中。举例而言,如果第4字幕句与当前段落的相似值不小于门槛值,表示第4字幕句与当前段落的内容比较相似,因此可以将第4字幕句加入当前段落。Based on the above, the segmentation method 200 further performs step S2422, and if the difference value is less than the threshold value, the current subtitle sentence is merged into the current paragraph. In one embodiment, the threshold value is used to filter the similarity value. When the similarity value is not less than the threshold value, it means that the content of the current subtitle sentence is similar to that of the current paragraph, so the current subtitle sentence can be merged into the current paragraph. For example, if the similarity value between the 4th subtitle sentence and the current paragraph is not less than the threshold value, it means that the content of the 4th subtitle sentence is similar to that of the current paragraph, so the 4th subtitle sentence can be added to the current paragraph.
承上述,分段方法200进一步执行步骤S2423,如果差异值不小于门槛值,以当前字幕句作为下一段落的起始句,并利用设定值根据特定顺序选取字幕句,将被选取的字幕句分为下一段落。举例而言,当相似值小于门槛值时,表示当前字幕句与当前段落的内容具有差异,因此将当前字幕句判定为第二段落的起始句。举例而言,如果第4字幕句与当前段落的相似值小于门槛值,表示第4字幕句与当前段落的内容具有差异,因此将第4字幕句作为下一段落的起始句。执行完步骤S252的操作后会接着执行利用设定值根据特定顺序选取字幕句,将被选取的字幕句分为下一段落,因此会重行执行步骤S230的操作,在此不再赘述。Based on the above, the segmentation method 200 further executes step S2423. If the difference value is not less than the threshold value, the current subtitle sentence is used as the starting sentence of the next paragraph, and the subtitle sentence is selected according to a specific order using the set value, and the selected subtitle sentence is divided into the next paragraph. For example, when the similarity value is less than the threshold value, it means that the content of the current subtitle sentence is different from that of the current paragraph, so the current subtitle sentence is determined as the starting sentence of the second paragraph. For example, if the similarity value between the 4th subtitle sentence and the current paragraph is less than the threshold value, it means that the content of the 4th subtitle sentence is different from that of the current paragraph, so the 4th subtitle sentence is used as the starting sentence of the next paragraph. After executing the operation of step S252, the subtitle sentence will be selected according to a specific order using the set value, and the selected subtitle sentence will be divided into the next paragraph, so the operation of step S230 will be re-executed, which will not be repeated here.
由上述的分段操作可以得知,每次做完一句字幕句的分段计算后会接着执行下一句字幕句的分段计算,直到所有的字幕句执行完毕为止,如果有剩余字幕句的数量少于设定值的设定时,可以不再针对剩余字幕句进行分段计算,而是直接将剩余字幕句并入当前段落,举例而言,如果剩余字幕句的数量为2,少于前述的设定值(前述将设定值设定为3),因此剩下的2句字幕句即可并入当前段落。It can be seen from the above segmentation operation that each time the segmentation calculation of a subtitle sentence is completed, the segmentation calculation of the next subtitle sentence will be executed until all subtitle sentences are executed. If the number of remaining subtitle sentences is less than the set value, the segmentation calculation can no longer be performed for the remaining subtitle sentences, but the remaining subtitle sentences can be directly merged into the current paragraph. For example, if the number of remaining subtitle sentences is 2, which is less than the aforementioned set value (the aforementioned set value is set to 3), the remaining 2 subtitle sentences can be merged into the current paragraph.
接着,执行完上述的分段步骤后,分段方法200执行步骤S250产生段落对应的注解。举例而言,如果在执行完全部的字幕句后分为3个段落,会分别计算3个段落的注解,注解可以是根据段落中的字幕句对应的关键字产生。最后,将分好的段落以及段落对应的注解存储至存储单元110的课程数据库DB2中。举例而言,如果差异值小于门槛值时,表示当前字幕句与当前段落较相似,因此可以利用字幕句的关键字作为当前段落对应的至少一特征。如果差异值不小于门槛值时,表示当前字幕句与当前段落不相似,因此可以利用字幕句的关键字作为下一段落对应的至少一特征。Then, after executing the above-mentioned segmentation steps, the segmentation method 200 executes step S250 to generate annotations corresponding to the paragraphs. For example, if the subtitles are divided into 3 paragraphs after executing all the subtitles, the annotations of the 3 paragraphs will be calculated respectively, and the annotations can be generated according to the keywords corresponding to the subtitles in the paragraphs. Finally, the divided paragraphs and the annotations corresponding to the paragraphs are stored in the course database DB2 of the storage unit 110. For example, if the difference value is less than the threshold value, it means that the current subtitle is similar to the current paragraph, so the keywords of the subtitle can be used as at least one feature corresponding to the current paragraph. If the difference value is not less than the threshold value, it means that the current subtitle is not similar to the current paragraph, so the keywords of the subtitle can be used as at least one feature corresponding to the next paragraph.
由上述本申请的实施方式可知,主要为改进以往利用人工方式进行影片段落标记,耗费大量人力以及时间的问题。先计算每一字幕句对应的关键字,在针对字幕句进行常见分段词汇判断,根据该常见分段词汇判断的判断结果产生下一段落或将第一字幕句并入当前段落,以产生分段结果,达到将学习影片中类似的主题进行分段并标注关键字的功能As can be seen from the implementation methods of the present application described above, the main purpose is to improve the problem of using manual methods to mark video paragraphs in the past, which consumes a lot of manpower and time. First, the keywords corresponding to each subtitle sentence are calculated, and then common segmentation vocabulary judgment is performed on the subtitle sentence. According to the judgment result of the common segmentation vocabulary judgment, the next paragraph is generated or the first subtitle sentence is merged into the current paragraph to generate a segmentation result, so as to achieve the function of segmenting similar topics in the learning film and marking keywords.
另外,上述例示包含依序的示范步骤,但这些步骤不必依所显示的顺序被执行。以不同顺序执行这些步骤皆在本公开内容的考量范围内。在本公开内容的实施例的精神与范围内的情况下,可视情况增加、取代、变更顺序及/或省略这些步骤。In addition, the above examples include sequential exemplary steps, but these steps do not have to be performed in the order shown. Performing these steps in different orders is within the scope of the present disclosure. In the spirit and scope of the embodiments of the present disclosure, these steps can be added, replaced, changed in order and/or omitted as appropriate.
虽然本公开内容已以实施方式披露如上,但其并非用以限定本发明内容,任何本领域普通技术人员,在不脱离本发明内容的精神和范围内的情况下,当可作各种更动与润饰,因此本发明内容的保护范围当视随附的权利要求书所界定者为准。Although the contents of the present disclosure have been disclosed as above in the form of implementation methods, they are not intended to limit the contents of the present disclosure. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the contents of the present disclosure. Therefore, the scope of protection of the contents of the present disclosure shall be determined by the definition of the attached claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| SG10201905236WASG10201905236WA (en) | 2018-09-07 | 2019-06-10 | Segmentation method, segmentation system and non-transitory computer-readable medium |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201862728082P | 2018-09-07 | 2018-09-07 | |
| US62/728,082 | 2018-09-07 |
| Publication Number | Publication Date |
|---|---|
| CN110895654A CN110895654A (en) | 2020-03-20 |
| CN110895654Btrue CN110895654B (en) | 2024-07-02 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910105172.8AActiveCN110895654B (en) | 2018-09-07 | 2019-02-01 | Segmentation method, segmentation system and non-transitory computer readable medium |
| CN201910104946.5AActiveCN110891202B (en) | 2018-09-07 | 2019-02-01 | Segmentation method, segmented system, and non-transitory computer-readable medium |
| CN201910104937.6AActiveCN110888896B (en) | 2018-09-07 | 2019-02-01 | Data searching method and data searching system thereof |
| CN201910105173.2APendingCN110889034A (en) | 2018-09-07 | 2019-02-01 | Data analysis method and data analysis system |
| CN201910266133.6APendingCN110888994A (en) | 2018-09-07 | 2019-04-03 | Multimedia data recommendation system and multimedia data recommendation method |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910104946.5AActiveCN110891202B (en) | 2018-09-07 | 2019-02-01 | Segmentation method, segmented system, and non-transitory computer-readable medium |
| CN201910104937.6AActiveCN110888896B (en) | 2018-09-07 | 2019-02-01 | Data searching method and data searching system thereof |
| CN201910105173.2APendingCN110889034A (en) | 2018-09-07 | 2019-02-01 | Data analysis method and data analysis system |
| CN201910266133.6APendingCN110888994A (en) | 2018-09-07 | 2019-04-03 | Multimedia data recommendation system and multimedia data recommendation method |
| Country | Link |
|---|---|
| JP (3) | JP6829740B2 (en) |
| CN (5) | CN110895654B (en) |
| SG (5) | SG10201905236WA (en) |
| TW (5) | TWI700597B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI756703B (en)* | 2020-06-03 | 2022-03-01 | 南開科技大學 | Digital learning system and method thereof |
| US12099537B2 (en) | 2020-09-21 | 2024-09-24 | Samsung Electronics Co., Ltd. | Electronic device, contents searching system and searching method thereof |
| CN114595854A (en)* | 2020-11-19 | 2022-06-07 | 英业达科技有限公司 | Method for tracking and predicting product quality based on social information |
| CN117351794B (en)* | 2023-10-13 | 2024-06-04 | 浙江上国教育科技有限公司 | Online course management system based on cloud platform |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102937972A (en)* | 2012-10-15 | 2013-02-20 | 上海外教社信息技术有限公司 | Audiovisual subtitle making system and method |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH07311539A (en)* | 1994-05-17 | 1995-11-28 | Hitachi Ltd | Teaching material editing support system |
| KR100250540B1 (en)* | 1996-08-13 | 2000-04-01 | 김광수 | Studying method of foreign language dictation with apparatus of playing caption video cd |
| JP2002041823A (en)* | 2000-07-27 | 2002-02-08 | Nippon Telegr & Teleph Corp <Ntt> | Information distribution device, information receiving device, and information distribution system |
| JP3685733B2 (en)* | 2001-04-11 | 2005-08-24 | 株式会社ジェイ・フィット | Multimedia data search apparatus, multimedia data search method, and multimedia data search program |
| JP2002341735A (en)* | 2001-05-16 | 2002-11-29 | Alice Factory:Kk | Broadband digital learning system |
| CN1432932A (en)* | 2002-01-16 | 2003-07-30 | 陈雯瑄 | English test and performance evaluation method and system |
| TW200411462A (en)* | 2002-12-20 | 2004-07-01 | Hsiao-Lien Wang | A method for matching information exchange on network |
| EP1616275A1 (en)* | 2003-04-14 | 2006-01-18 | Koninklijke Philips Electronics N.V. | Method and apparatus for summarizing a music video using content analysis |
| JP4471737B2 (en)* | 2003-10-06 | 2010-06-02 | 日本電信電話株式会社 | Grouping condition determining device and method, keyword expansion device and method using the same, content search system, content information providing system and method, and program |
| JP4426894B2 (en)* | 2004-04-15 | 2010-03-03 | 株式会社日立製作所 | Document search method, document search program, and document search apparatus for executing the same |
| JP2005321662A (en)* | 2004-05-10 | 2005-11-17 | Fuji Xerox Co Ltd | Learning support system and method |
| JP2006003670A (en)* | 2004-06-18 | 2006-01-05 | Hitachi Ltd | Educational content provision system |
| WO2006123261A2 (en)* | 2005-03-31 | 2006-11-23 | Koninklijke Philips Electronics, N.V. | Augmenting lectures based on prior exams |
| US9058406B2 (en)* | 2005-09-14 | 2015-06-16 | Millennial Media, Inc. | Management of multiple advertising inventories using a monetization platform |
| JP5167546B2 (en)* | 2006-08-21 | 2013-03-21 | 国立大学法人京都大学 | Sentence search method, sentence search device, computer program, recording medium, and document storage device |
| TW200825900A (en)* | 2006-12-13 | 2008-06-16 | Inst Information Industry | System and method for generating wiki by sectional time of handout and recording medium thereof |
| JP5010292B2 (en)* | 2007-01-18 | 2012-08-29 | 株式会社東芝 | Video attribute information output device, video summarization device, program, and video attribute information output method |
| JP5158766B2 (en)* | 2007-10-23 | 2013-03-06 | シャープ株式会社 | Content selection device, television, content selection program, and storage medium |
| TW200923860A (en)* | 2007-11-19 | 2009-06-01 | Univ Nat Taiwan Science Tech | Interactive learning system |
| CN101382937B (en)* | 2008-07-01 | 2011-03-30 | 深圳先进技术研究院 | Speech recognition-based multimedia resource processing method and its online teaching system |
| US8140544B2 (en)* | 2008-09-03 | 2012-03-20 | International Business Machines Corporation | Interactive digital video library |
| CN101453649B (en)* | 2008-12-30 | 2011-01-05 | 浙江大学 | Key frame extracting method for compression domain video stream |
| JP5366632B2 (en)* | 2009-04-21 | 2013-12-11 | エヌ・ティ・ティ・コミュニケーションズ株式会社 | Search support keyword presentation device, method and program |
| JP5493515B2 (en)* | 2009-07-03 | 2014-05-14 | 富士通株式会社 | Portable terminal device, information search method, and information search program |
| WO2011088412A1 (en)* | 2010-01-15 | 2011-07-21 | Apollo Group, Inc. | Dynamically recommending learning content |
| JP2012038239A (en)* | 2010-08-11 | 2012-02-23 | Sony Corp | Information processing equipment, information processing method and program |
| US8839110B2 (en)* | 2011-02-16 | 2014-09-16 | Apple Inc. | Rate conform operation for a media-editing application |
| CN102222227B (en)* | 2011-04-25 | 2013-07-31 | 中国华录集团有限公司 | A system based on video recognition and image extraction |
| CN102348049B (en)* | 2011-09-16 | 2013-09-18 | 央视国际网络有限公司 | Method and device for detecting position of cut point of video segment |
| CN102509007A (en)* | 2011-11-01 | 2012-06-20 | 北京瑞信在线系统技术有限公司 | Method, system and device for multimedia teaching evaluation and multimedia teaching system |
| JP5216922B1 (en)* | 2012-01-06 | 2013-06-19 | Flens株式会社 | Learning support server, learning support system, and learning support program |
| US9846696B2 (en)* | 2012-02-29 | 2017-12-19 | Telefonaktiebolaget Lm Ericsson (Publ) | Apparatus and methods for indexing multimedia content |
| US20130263166A1 (en)* | 2012-03-27 | 2013-10-03 | Bluefin Labs, Inc. | Social Networking System Targeted Message Synchronization |
| US9058385B2 (en)* | 2012-06-26 | 2015-06-16 | Aol Inc. | Systems and methods for identifying electronic content using video graphs |
| TWI513286B (en)* | 2012-08-28 | 2015-12-11 | Ind Tech Res Inst | Method and system for continuous video replay |
| WO2014100893A1 (en)* | 2012-12-28 | 2014-07-03 | Jérémie Salvatore De Villiers | System and method for the automated customization of audio and video media |
| JP6205767B2 (en)* | 2013-03-13 | 2017-10-04 | カシオ計算機株式会社 | Learning support device, learning support method, learning support program, learning support system, and server device |
| TWI549498B (en)* | 2013-06-24 | 2016-09-11 | wu-xiong Chen | Variable audio and video playback method |
| CN104572716A (en)* | 2013-10-18 | 2015-04-29 | 英业达科技有限公司 | System and method for playing video files |
| KR101537370B1 (en)* | 2013-11-06 | 2015-07-16 | 주식회사 시스트란인터내셔널 | System for grasping speech meaning of recording audio data based on keyword spotting, and indexing method and method thereof using the system |
| US20150206441A1 (en)* | 2014-01-18 | 2015-07-23 | Invent.ly LLC | Personalized online learning management system and method |
| CN104123332B (en)* | 2014-01-24 | 2018-11-09 | 腾讯科技(深圳)有限公司 | The display methods and device of search result |
| US9892194B2 (en)* | 2014-04-04 | 2018-02-13 | Fujitsu Limited | Topic identification in lecture videos |
| US20150293995A1 (en)* | 2014-04-14 | 2015-10-15 | David Mo Chen | Systems and Methods for Performing Multi-Modal Video Search |
| JP6334431B2 (en)* | 2015-02-18 | 2018-05-30 | 株式会社日立製作所 | Data analysis apparatus, data analysis method, and data analysis program |
| US20160239155A1 (en)* | 2015-02-18 | 2016-08-18 | Google Inc. | Adaptive media |
| CN104978961B (en)* | 2015-05-25 | 2019-10-15 | 广州酷狗计算机科技有限公司 | A kind of audio-frequency processing method, device and terminal |
| CN105047203B (en)* | 2015-05-25 | 2019-09-10 | 广州酷狗计算机科技有限公司 | A kind of audio-frequency processing method, device and terminal |
| TWI571756B (en)* | 2015-12-11 | 2017-02-21 | 財團法人工業技術研究院 | Methods and systems for analyzing reading log and documents corresponding thereof |
| CN105978800A (en)* | 2016-07-04 | 2016-09-28 | 广东小天才科技有限公司 | Method, system and server for pushing questions to mobile terminal |
| CN106202453B (en)* | 2016-07-13 | 2020-08-04 | 网易(杭州)网络有限公司 | Multimedia resource recommendation method and device |
| CN106231399A (en)* | 2016-08-01 | 2016-12-14 | 乐视控股(北京)有限公司 | Methods of video segmentation, equipment and system |
| CN106331893B (en)* | 2016-08-31 | 2019-09-03 | 科大讯飞股份有限公司 | Real-time caption presentation method and system |
| CN108122437A (en)* | 2016-11-28 | 2018-06-05 | 北大方正集团有限公司 | Adaptive learning method and device |
| CN107256262B (en)* | 2017-06-13 | 2020-04-14 | 西安电子科技大学 | An Image Retrieval Method Based on Object Detection |
| CN107623860A (en)* | 2017-08-09 | 2018-01-23 | 北京奇艺世纪科技有限公司 | Multi-medium data dividing method and device |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102937972A (en)* | 2012-10-15 | 2013-02-20 | 上海外教社信息技术有限公司 | Audiovisual subtitle making system and method |
| Publication number | Publication date |
|---|---|
| CN110891202A (en) | 2020-03-17 |
| SG10201906347QA (en) | 2020-04-29 |
| CN110895654A (en) | 2020-03-20 |
| TW202011222A (en) | 2020-03-16 |
| CN110888994A (en) | 2020-03-17 |
| CN110889034A (en) | 2020-03-17 |
| CN110888896A (en) | 2020-03-17 |
| CN110891202B (en) | 2022-03-25 |
| TW202011221A (en) | 2020-03-16 |
| JP6829740B2 (en) | 2021-02-10 |
| TW202011231A (en) | 2020-03-16 |
| TWI709905B (en) | 2020-11-11 |
| TWI700597B (en) | 2020-08-01 |
| SG10201905523TA (en) | 2020-04-29 |
| JP2020042770A (en) | 2020-03-19 |
| TW202011232A (en) | 2020-03-16 |
| SG10201905236WA (en) | 2020-04-29 |
| SG10201905532QA (en) | 2020-04-29 |
| TWI725375B (en) | 2021-04-21 |
| CN110888896B (en) | 2023-09-05 |
| TWI699663B (en) | 2020-07-21 |
| TW202011749A (en) | 2020-03-16 |
| JP2020042777A (en) | 2020-03-19 |
| TWI696386B (en) | 2020-06-11 |
| JP2020042771A (en) | 2020-03-19 |
| SG10201907250TA (en) | 2020-04-29 |
| Publication | Publication Date | Title |
|---|---|---|
| CN110895654B (en) | Segmentation method, segmentation system and non-transitory computer readable medium | |
| CN108009293A (en) | Video tab generation method, device, computer equipment and storage medium | |
| CN102483743B (en) | Detecting writing systems and languages | |
| US8843815B2 (en) | System and method for automatically extracting metadata from unstructured electronic documents | |
| JP6335898B2 (en) | Information classification based on product recognition | |
| CN107463548B (en) | Phrase mining method and device | |
| CN109275047B (en) | Video information processing method and device, electronic device, storage medium | |
| CN112287914B (en) | PPT video segment extraction method, device, equipment and medium | |
| US20180081861A1 (en) | Smart document building using natural language processing | |
| CN112214984B (en) | Content plagiarism identification method, device, equipment and storage medium | |
| CN102081598B (en) | Method for detecting duplicated texts | |
| CN104679769A (en) | Method and device for classifying usage scenario of product | |
| CN111291572A (en) | Character typesetting method and device and computer readable storage medium | |
| Petryk et al. | ALOHa: A new measure for hallucination in captioning models | |
| CN108875743B (en) | Text recognition method and device | |
| US20150064684A1 (en) | Assessment of curated content | |
| CN111783467A (en) | A method and device for identifying an enterprise name | |
| TW200925895A (en) | System and method for real-time new event detection on video streams | |
| CN107924398B (en) | System and method for providing a review-centric news reader | |
| WO2024188044A1 (en) | Video tag generation method and apparatus, electronic device, and storage medium | |
| CN116029280A (en) | Method, device, computing equipment and storage medium for extracting key information of document | |
| US12314984B2 (en) | Method and apparatus for displaying product review information, electronic device and storage medium | |
| US12423601B2 (en) | Systems and methods for analysis explainability | |
| CN118733717A (en) | File duplication checking method, device, equipment, storage medium and program product | |
| US20190205320A1 (en) | Sentence scoring apparatus and program |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant |