KR101805129B1

Movatterモバイル変換

Info

Publication number: KR101805129B1
Application number: KR1020160059714A
Authority: KR
Inventors: 장재영; 이규홍; 이병준; 조세진; 한다혜
Original assignee: 한성대학교 산학협력단
Priority date: 2016-05-16
Filing date: 2016-05-16
Publication date: 2017-12-07
Anticipated expiration: 2036-05-16
Also published as: KR20170129333A

Abstract

Translated fromKorean

일 실시예에 따른 컨텐츠 분류 방법은 컨텐츠의 광고성을 판단하는 것으로 지정된 적어도 하나 이상의 제1 분류자 후보를 기초로 복수의 샘플 컨텐츠를 광고성 또는 비광고성으로 분류하는 단계와, 상기 제1 분류자 후보와는 상이한 적어도 하나 이상의 제2 분류자 후보 및 상기 제1 분류자 후보가 상기 샘플 컨텐츠의 광고성을 판단하는 것과 관련이 있는지를 나타내는 관련도를 상기 제1 분류자 후보 및 상기 제2 분류자 후보 각각에 대하여 산출하는 단계와, 상기 산출된 관련도를 기초로 상기 제1 분류자 후보 및 상기 제2 분류자 후보 중 적어도 하나 이상을 컨텐츠의 광고성을 판단하는 분류자로 선정하는 단계와, 상기 선정된 분류자를 기초로 적어도 하나 이상의 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류하는 단계를 포함한다. According to an exemplary embodiment of the present invention, there is provided a content classifying method comprising: classifying a plurality of sample contents into advertisement or non-advertisement based on at least one or more first classifier candidates determined to judge the advertisement of contents; The degree of relevance indicating whether at least one or more second classifier candidates different from each other and whether the first classifier candidate is related to judging the advertisement of the sample content is stored in each of the first classifier candidate and the second classifier candidate Selecting at least one of the first classifier candidate and the second classifier candidate as a classifier for judging the advertisement of the content based on the calculated degree of relevance; And classifying at least one content into an advertisement content or a non-advertisement content based on the content.

Description

Translated fromKorean

컨텐츠 분류 방법{METHOD FOR CLASSIFYING CONTENTS}METHOD FOR CLASSIFYING CONTENTS

본 발명은 컨텐츠 분류 방법에 관한 것이다. 보다 자세하게는 블로그와 같은 인터넷 상의 컨텐츠가 광고적인 목적으로 작성되었는지 여부를 분류하기 위한 기준을 선정하고, 이와 같이 선정된 기준을 기초로 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류하는 방법에 관한 것이다.The present invention relates to a content classification method. More specifically, the present invention relates to a method of selecting a criterion for classifying whether content on the Internet such as a blog is created for advertising purposes, and classifying the content into the advertisement content or the non-advertisement content based on the criterion thus selected.

본 발명은 2011년도 정부(교육부)의 재원으로 한국연구재단의 기초연구사업 지원을 받아 수행된 것입니다. (과제번호: NRF-2011-0022445)The present invention was funded by the government (Ministry of Education) in 2011 and supported by the Korea Research Foundation's basic research project. (Project number: NRF-2011-0022445)

Web 2.0시대의 도래로 인하여 SNS는 빠른 속도로 확산되고 있으며, 이를 통해 SNS는 대중들이 사회 전반에 대한 그들의 주관적 의견(subject opinion)들을 피력할 수 있는 다양한 장의 토대가 되고 있다.With the advent of the Web 2.0 era, SNS is spreading at a rapid pace, and SNS has become the foundation for various fields where the public can express their subjective opinions on society as a whole.

특히 포털 사이트에서 제공하는 블로그(blog)를 통하면, 작성자의 지식이나 경험들이 독자들과 손쉽게 공유될 수 있는 바, 많은 사람들이 이를 이용하고 있다. 블로그에는 다양한 분야에 대한 지식들이 공유되으며, 이 중 외식 정보에 관련된 블로그는큰 비중을 차지하고 있다. 아울러 외식 블로그를 통한 마케팅 연구도 활발히 진행되고 있다.Especially, through the blog provided by the portal site, the knowledge and experience of the author can be easily shared with the readers, and many people use it. Blogs share knowledge about various fields, among which blogs related to eating out information are a big part. In addition, marketing research through restaurant blogs is actively under way.

외식 블로그는 작성자가 외식업소에 직접 방문하여 체험한 주관적 혹은 객관적 정보를 독자에게 전달할 수 있다. 독자들은 외식업소를 선택하기 전에 블로그에서 추천하는 업소들에 대한 평가를 참고한다. 그러나 인터넷상에 범람하는 각종 광고 블로그들은 독자에게 객관적 정보를 제공하기 어렵다. 왜냐하면, 블로그는 광고를 의뢰한 업체의 이익을 대변하는 것에 초점을 맞춤으로써 독자들에게 왜곡된 정보를 제공할 수 있기 때문이다.The restaurant blog can communicate the subjective or objective information experienced by the author to the restaurant. Readers are encouraged to refer to the evaluation of businesses they recommend on blogs before choosing a restaurant. However, various advertising blogs that flood the Internet are difficult to provide objective information to readers. This is because blogs can provide distorted information to readers by focusing on representing the interests of the companies that have commissioned the ads.

물론 외식업소 정보를 제공하는 블로그도 마케팅을 일종으로 인식되고 있다. 따라서, 블로그를 운영하는 포털 사이트에서는 홍보용 블로그임을 명시하는 조건으로 게시를 허용하고 있다. 그러나 광고 블로그들은 직접 체험한 리뷰를 가장한 허위 또는 과장된 내용들을 포함할 수 있다. 따라서 이러한 블로그들을 순수한 리뷰를 작성한 블로그들로부터 필터링할 필요성이 있다.Of course, blogs that provide restaurant information are also recognized as a kind of marketing. Therefore, in a portal site running a blog, posting is allowed on condition that it is a blog for promoting. However, advertising blogs can contain fictitious or exaggerated content that masquerades as a review experience. Therefore, there is a need to filter these blogs from blogs that write pure reviews.

광고 블로그를 필터링하는 것과 유사한 기술로는 스팸메일 필터링(spam mail filtering) 기술이 있다. 그러나, 스팸메일 필터링 기술은 광고 블로그 필터링에 적용하기 어렵다. 그 이유는 스팸 메일의 경우 메일에 포함된 패턴이나 단어 분포만으로도 스팸인지 아닌지가 대부분 쉽게 판별 가능하나, 광고 블로그는 광고가 아닌 것으로 위장하여 작성되기 때문이다. 따라서, 광고 블로그의 경우 필터링 난이도가 스팸 메일보다 상대적으로 높다.A similar technique for filtering advertising blogs is spam mail filtering. However, spam filtering techniques are difficult to apply to ad blog filtering. This is because, in the case of spam mail, it is easy to determine whether or not the spam is contained in only the pattern or word distribution included in the mail, but the advertisement blog is created by disguising it as being not an advertisement. Therefore, in the case of an advertisement blog, the filtering difficulty is relatively higher than the spam mail.

한국특허공개공보 2010-0068531, 공개일자 2010년 06월 24일Korean Patent Laid-Open Publication No. 2010-0068531, published on June 24, 2010

본 발명의 해결하고자 하는 과제는 블로그와 같은 인터넷 상의 컨텐츠가 광고적인 목적으로 작성되었는지 여부를 분류하기 위한 기준을 선정하고, 이와 같이 선정된 기준을 기초로 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류하는 기술을 제공하는 것이다.A problem to be solved by the present invention is to select a criterion for classifying whether content on the Internet such as a blog is created for an advertisement purpose and to classify the content as an advertisement content or a non-advertisement content based on the criterion so selected Technology.

다만, 본 발명의 해결하고자 하는 과제는 이에 한정되지 않는다.However, the problem to be solved by the present invention is not limited to this.

일 실시예에 따른 컨텐츠 분류 방법은 컨텐츠의 광고성을 판단하는 것으로 지정된 적어도 하나 이상의 제1 분류자 후보를 기초로 복수의 샘플 컨텐츠를 광고성 또는 비광고성으로 분류하는 단계와, 상기 제1 분류자 후보와는 상이한 적어도 하나 이상의 제2 분류자 후보 및 상기 제1 분류자 후보가 상기 샘플 컨텐츠의 광고성을 판단하는 것과 관련이 있는지를 나타내는 관련도를 상기 제1 분류자 후보 및 상기 제2 분류자 후보 각각에 대하여 산출하는 단계와, 상기 산출된 관련도를 기초로 상기 제1 분류자 후보 및 상기 제2 분류자 후보 중 적어도 하나 이상을 컨텐츠의 광고성을 판단하는 분류자로 선정하는 단계와, 상기 선정된 분류자를 기초로 적어도 하나 이상의 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류하는 단계를 포함한다.According to an exemplary embodiment of the present invention, there is provided a content classifying method comprising: classifying a plurality of sample contents into advertisement or non-advertisement based on at least one or more first classifier candidates determined to judge the advertisement of contents; The degree of relevance indicating whether at least one or more second classifier candidates different from each other and whether the first classifier candidate is related to judging the advertisement of the sample content is stored in each of the first classifier candidate and the second classifier candidate Selecting at least one of the first classifier candidate and the second classifier candidate as a classifier for judging the advertisement of the content based on the calculated degree of relevance; And classifying at least one content into an advertisement content or a non-advertisement content based on the content.

또한, 상기 분류하는 단계는 상기 제1 분류자 후보가 상기 컨텐츠에 포함되는 횟수를 기초로 분류할 수 있다.In addition, the classifying step may classify based on the number of times the first classifier candidate is included in the contents.

또한, 상기 컨텐츠가 음식점을 평가한 내용을 포함하는 경우, 상기 제1 분류자 후보는 단어 '맛집'이 상기 컨텐츠의 제목 또는 본문에 언급된 횟수, 상기 음식점의 상호명이 상기 컨텐츠에서 언급된 횟수 및 상기 음식점의 주소가 상기 컨텐츠에서 언급되었는지 여부 중 적어도 하나 이상을 포함할 수 있다.In addition, when the content includes the evaluation of a restaurant, the first classifier candidate may include a number of times the word 'restaurant' is mentioned in the title or body of the content, the number of times the business name of the restaurant is mentioned in the content, Whether or not the address of the restaurant is mentioned in the contents, and the like.

또한, 상기 산출하는 단계는 상기 제1 분류자 후보 및 상기 제2 분류자 후보 각각이 상기 광고성으로 분류된 광고성 컨텐츠와 상기 비광고성으로 분류된 비광고성 컨텐츠에 각각 포함되는 정도를 판단하는 것을 기초로 상기 관련도를 산출할 수 있다.The calculating step may be based on determining whether the first classifier candidate and the second classifier candidate are respectively included in the advertisement content classified as the advertisement and the non-advertisement content classified as the non-advertisement, The degree of association can be calculated.

또한, 상기 산출하는 단계는 상기 제1 분류자 후보 및 상기 제2 분류자 후보 각각에 대하여, 상기 광고성 컨텐츠 및 상기 비광고성 컨텐츠에 대한 상관분석을 수행하는 단계와, 상기 수행된 상관분석을 기초로 상기 관련도를 산출하는 단계를 포함할 수 있다.Further, the calculating step may include performing correlation analysis on the advertisement content and the non-advertisement content with respect to each of the first classifier candidate and the second classifier candidate, and based on the performed correlation analysis And calculating the degree of association.

또한, 상기 제2 분류자 후보는 상기 컨텐츠에 포함된 단어의 긍정적 성향의 정도 또는 부정적 성향의 정도를 포함할 수 있다.In addition, the second classifier candidate may include a degree of a positive inclination or a degree of a negative inclination of a word included in the content.

일 실시예에 따른 컨텐츠 분류 방법은 컨텐츠의 광고성을 판단하는 것으로 지정된 적어도 하나 이상의 제1 분류자 후보를 기초로 복수의 샘플 컨텐츠를 광고성 또는 비광고성으로 분류하는 단계와, 상기 제1 분류자 후보와는 상이한 적어도 하나 이상의 제2 분류자 후보 및 상기 제1 분류자 후보가 상기 샘플 컨텐츠의 광고성을 판단하는 것과 관련이 있는지를 나타내는 관련도를 상기 제1 분류자 후보 및 상기 제2 분류자 후보 각각에 대하여 산출하는 단계와, 상기 산출된 관련도를 기초로 상기 제1 분류자 후보 및 상기 제2 분류자 후보 중 적어도 하나 이상을 컨텐츠의 광고성을 판단하는 분류자로 선정하는 단계와, 상기 선정된 분류자를 기초로 적어도 하나 이상의 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류하는 단계를 수행하도록 프로그램된 컴퓨터 프로그램이 저장된 판독가능한 기록 매체로 구현 가능하다.According to an exemplary embodiment of the present invention, there is provided a content classifying method comprising: classifying a plurality of sample contents into advertisement or non-advertisement based on at least one or more first classifier candidates determined to judge the advertisement of contents; The degree of relevance indicating whether at least one or more second classifier candidates different from each other and whether the first classifier candidate is related to judging the advertisement of the sample content is stored in each of the first classifier candidate and the second classifier candidate Selecting at least one of the first classifier candidate and the second classifier candidate as a classifier for judging the advertisement of the content based on the calculated degree of relevance; And classifying at least one content into advertising content or non-advertising content based on the program The present invention can be embodied as a readable recording medium having stored thereon a computer program.

일 실시예에 따른 컨텐츠 분류 방법은 컨텐츠의 광고성을 판단하는 것으로 지정된 적어도 하나 이상의 제1 분류자 후보를 기초로 복수의 샘플 컨텐츠를 광고성 또는 비광고성으로 분류하는 단계와, 상기 제1 분류자 후보와는 상이한 적어도 하나 이상의 제2 분류자 후보 및 상기 제1 분류자 후보가 상기 샘플 컨텐츠의 광고성을 판단하는 것과 관련이 있는지를 나타내는 관련도를 상기 제1 분류자 후보 및 상기 제2 분류자 후보 각각에 대하여 산출하는 단계와, 상기 산출된 관련도를 기초로 상기 제1 분류자 후보 및 상기 제2 분류자 후보 중 적어도 하나 이상을 컨텐츠의 광고성을 판단하는 분류자로 선정하는 단계와, 상기 선정된 분류자를 기초로 적어도 하나 이상의 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류하는 단계를 수행하도록 프로그램된 컴퓨터 판독가능 기록매체에 저장된 컴퓨터 프로그램으로 구현 가능하다.According to an exemplary embodiment of the present invention, there is provided a content classifying method comprising: classifying a plurality of sample contents into advertisement or non-advertisement based on at least one or more first classifier candidates determined to judge the advertisement of contents; The degree of relevance indicating whether at least one or more second classifier candidates different from each other and whether the first classifier candidate is related to judging the advertisement of the sample content is stored in each of the first classifier candidate and the second classifier candidate Selecting at least one of the first classifier candidate and the second classifier candidate as a classifier for judging the advertisement of the content based on the calculated degree of relevance; And classifying at least one content into advertising content or non-advertising content based on the program And a computer program stored in the computer readable recording medium.

일 실시예에 따르면, 블로그와 같은 인터넷 상의 컨텐츠가 광고적인 목적으로 작성되었는지 여부를 분류하기 위한 기준인 분류자를 선정할 수 있으며, 이와 같이 선정된 분류자를 기초로 컨텐츠가 광고성 컨텐츠인지 아니면 비광고성 컨텐츠인지를 파악할 수 있다.According to one embodiment, a classifier, which is a criterion for classifying whether contents on the Internet such as a blog are created for advertising purposes, can be selected. Based on the classifier thus selected, whether the contents are the advertisement contents or the non- Can be understood.

따라서, 이러한 기술을 검색 엔진 등에 적용할 경우, 인터넷 사용자는 광고성 컨텐츠가 배제된 컨텐츠로부터 보다 정확한 정보 등을 확보할 수 있다.Therefore, when such a technique is applied to a search engine or the like, the Internet user can secure more accurate information from the contents excluding the advertisement contents.

도 1은 일 실시예에 따른 컨텐츠 분류 방법의 과정을 도시한 도면이다.
도 2는 일 실시예에 따른 제1 분류자 후보에 대하여 도시한 도면이다.
도 3은 일 실시예에 따른 제1 분류자 후보 및 제2 분류자 후보에 대하여 도시한 도면이다
도 4는 일 실시예에 따른 제2 분류자 후보에 대하여 도시한 도면이다.
도 5는 일 실시예에 따른 분류자 후보의 상관관계에 대하여 도시한 도면이다
도 6은 일 실시예에 따른 분류자 후보를 조합한 것을 도시한 도면이다.
도 7은 일 실시예에 따른 컨텐츠 분류 방법이 적용된 예를 도시한 도면이다.1 is a flowchart illustrating a content classification method according to an exemplary embodiment of the present invention.
2 is a diagram illustrating a first classifier candidate according to an embodiment.
3 is a diagram showing a first classifier candidate and a second classifier candidate according to an embodiment
4 is a diagram illustrating a second classifier candidate according to an embodiment.
5 is a diagram illustrating a correlation of a classifier candidate according to an embodiment
6 is a diagram showing a combination of classifier candidates according to an embodiment.
FIG. 7 is a diagram illustrating an example in which a content classification method according to an embodiment is applied.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The following terms are defined in consideration of the functions in the embodiments of the present invention, which may vary depending on the intention of the user, the intention or the custom of the operator. Therefore, the definition should be based on the contents throughout this specification.

일 실시예에 따른 발명에 대하여 설명하기에 앞서, 컨텐츠는 텍스트, 이미지 또는 동영상 등으로 구성되어 인터넷을 통해 검색 가능한 것을 의미한다. 예컨데 컨텐츠는 블로그, 신문기사, 인터넷 커뮤니티 등에 업로드된 게시글 등을 포함할 수 있다. 컨텐츠가 다루는 주제는 예컨데 맛집, 전자제품 등일 수 있다.Before describing the invention according to an embodiment, the content may be composed of text, images, moving images, or the like, and is searchable through the Internet. For example, the content may include blogs, newspaper articles, posts uploaded to the Internet community, and the like. The subject matter of the content may be, for example, a restaurant, an electronic product, or the like.

아울러, 컨텐츠의 광고성이란, 컨텐츠가 이를 작성한 작성자가 광고를 원하는 사람의 의뢰를 받고 작성되었음을 나타내는 정도를 의미한다. 따라서, 광고성 컨텐츠란 작성자가 광고를 원하는 사람의 의뢰를 받고 작성한 컨텐츠를 의미하며, 비광고성 컨텐츠란 작성자가 이러한 의뢰를 받지 않고 자신의 의지 및 의견에 따라 작성한 컨텐츠를 의미한다.In addition, the advertisementability of the content means the degree to which the content author indicates that the creator of the content has been created with the request of the person who wants to advertise. Therefore, the advertisement content refers to content created by a creator who receives a request from the creator, and the non-advertisement content means content created by the creator in accordance with his or her will and opinions without receiving such a request.

도 1은 일 실시예에 따른 컨텐츠 분류 방법의 각 단계를 도시한 도면이다.1 is a diagram illustrating each step of a content classification method according to an embodiment.

먼저, 도 1에 도시된 컨텐츠 분류 방법은 컴퓨터나 서버에서 수행 가능하다. 이 때의 컴퓨터나 서버는 예컨데 검색 엔진 등의 서비스를 제공하는 장치일 수 있으나 이에 한정되는 것은 아니다. 아울러 각 단계는 실시예에 따라서 수행되지 않거나 또는 도시된 것과는 상이한 순서로 수행될 수 있으며, 도시되지 않은 단계가 추가로 수행될 수도 있다.First, the content classification method shown in FIG. 1 can be performed in a computer or a server. The computer or server at this time may be, for example, a device that provides a service such as a search engine, but is not limited thereto. In addition, each step may not be performed in accordance with the embodiment, or may be performed in a different order from that shown, and steps not shown may be additionally performed.

도 1을 참조하면, 컨텐츠 분류 방법은 컨텐츠의 광고성을 판단하는 것으로 지정된 적어도 하나 이상의 제1 분류자 후보를 기초로 복수의 컨텐츠를 광고성 또는 비광고성으로 분류하는 단계(S100)를 포함한다.Referring to FIG. 1, a content classification method includes classifying a plurality of contents into advertisements or non-advertisements (S100) based on at least one or more first classifier candidates determined to determine the advertisement of contents.

단계 S100에서는 제1 분류자 후보를 이용한다. 제1 분류자 후보는 분류자가 될 수 있는 후보 항목을 의미한다. 여기서 분류자란 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류하는데 사용되는 기준을 의미한다.In step S100, the first classifier candidate is used. The first classifier candidate is a candidate item that can be a classifier. Here, the classifier means a criterion used to classify the content into the advertisement content or the non-advertisement content.

예컨데, 항목이 광고성 컨텐츠에 주로 출현하고 비광고성 컨텐츠에는 거의 출현하지 않거나 그 반대인 경우, 즉 출현하는 횟수가 광고성 컨텐츠 및 비광고성 컨텐츠 사이에서 현격하게 상이한 경우, 해당 항목의 출현 여부를 기초로 컨텐츠의 광고성을 분류할 수 있다. 즉, 이 경우 해당 항목은 분류자 후보가 될 수 있다. 반면, 항목이 광고성 컨텐츠 및 비광고성 컨텐츠 각각에서 고르게 출현하거나 또는 전혀 출현하지 않는 경우, 해당 항목은 분류자 후보가 될 수 없다.For example, when the item appears mainly in the advertisement content and rarely appears in the non-advertisement content, or vice versa, that is, when the number of appearances is significantly different between the advertisement content and the non-advertisement content, Can be classified. That is, in this case, the item can be a candidate for a classifier. On the other hand, if an item appears uniformly or not at all in both the advertising content and the non-advertising content, the item can not be a candidate for the classifier.

제1 분류자 후보는 미리 사용자 등에 의하여 지정된 항목을 포함할 수 있다. 이와 달리 제1 분류자 후보는 다음과 같은 방식으로 지정될 수 있으나 이에 한정되는 것은 아니다. 먼저, 컨텐츠 중에서 본 컨텐츠는 '광고'임을 명시적으로 밝히는 컨텐츠를 선별한다. 이는, 컨텐츠 중에 '본 컨텐츠는 협찬을 받고 작성되었다'라는 내용이 포함되어 있는지 여부로 판별한다. 다음으로 광고가 아님이 명백한 비광고성 컨텐츠를 선별한다. 마지막으로, 광고성 컨텐츠 및 비광고성 컨텐츠 사이에서 출현하는 횟수가 현격하게 상이한 항목을 추출한 뒤, 이와 같이 추출된 항목을 제1 분류자 후보로 지정한다.The first classifier candidate may include an item previously specified by a user or the like. Alternatively, the first classifier candidate may be specified in the following manner, but is not limited thereto. First, the contents that are explicitly indicated as 'advertisement' are selected. This determination is made based on whether or not the content of the content is "received with sponsorship". Next, select non-advertising content that is not explicitly advertised. Finally, items having a significantly different number of appearances between the advertisement content and the non-advertisement content are extracted, and the extracted items are designated as the first classifier candidate.

제1 분류자 후보에 대한 예시는 도 2에 도시되어 있다. 도 2는 컨텐츠가 '맛집'에 관한 내용을 포함하는 경우의 제1 분류자 후보를 예시적으로 도시한 도면이다. 도 2를 참조하면, 광고성 컨텐츠와 비광고 컨텐츠는 단어 '맛집'이 컨텐츠의 제목이나 본문에 언급된 횟수, 음식점의 '상호명'이 컨텐츠에서 언급된 횟수 및 음식점의 주소가 컨텐츠에서 언급되었는지 여부가 상이하다. 따라서, 단어 '맛집'이 컨텐츠의 제목이나 본문에 언급된 횟수, 음식점의 '상호명'이 컨텐츠에서 언급된 횟수 및 음식점의 주소가 컨텐츠에서 언급되었는지 여부는 각각, 제1 분류자 후보가 될 수 있다.An example for a first classifier candidate is shown in FIG. 2 is a diagram exemplarily showing a first classifier candidate in the case where the contents include contents concerning a " restaurant ". 2, the number of times the word 'restaurant' is mentioned in the title or body of the content, the number of times the 'business name' of the restaurant is mentioned in the content, and the address of the restaurant are mentioned in the contents It is different. Thus, the number of times the word 'restaurant' is mentioned in the title or body of the content, the number of times the 'business name' of the restaurant is mentioned in the content, and whether the address of the restaurant is mentioned in the content can be the first classifier candidate, respectively .

다시 도 1을 참조하면, 단계 S100에서는 제1 분류자 후보를 이용하여 복수의 샘플 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류한다. 예컨데, 제1 분류자 후보에 부합하는 샘플 컨텐츠는 광고성으로 분류하고 부합하지 않는 샘플 컨텐츠는 비광고성으로 분류할 수 있다.Referring again to FIG. 1, in step S100, a plurality of sample contents are classified into an advertisement content or a non-advertisement content using a first classifier candidate. For example, the sample content corresponding to the first classifier candidate is classified as the advertisement, and the non-matching sample content is classified as the non-advertisement.

여기서 샘플 컨텐츠란 제1 분류자 후보 및 후술할 제2 분류자 후보가 컨텐츠의 광고성을 판단하는 분류자가 될 수 있을지를 판단하기 위하여 테스트를 하는 컨텐츠를 의미하며, 복수 개일 수 있다. 이러한 샘플 컨텐츠는 공공기관, 포털 사이트 등에서 랜덤으로 수집 가능하다.Here, the sample content refers to a content to be tested to determine whether the first classifier candidate and the second classifier candidate to be described later can be a classifier for judging the advertisement of the content, and may be plural. These sample contents can be collected randomly at public institutions and portal sites.

다음으로, 적어도 하나 이상의 제2 분류자 후보 및 제1 분류자 후보가 샘플 컨텐츠의 광고성을 판단하는 것과 관련이 있는지를 나타내는 관련도를 각각의 분류자 후보에 대하여 산출하는 단계(S110)가 수행된다.Next, a relation (S110) is calculated for each classifier candidate to indicate whether at least one of the second classifier candidate and the first classifier candidate is related to judging the advertisement of the sample contents .

제2 분류자 후보는 제1 분류자 후보와는 상이한 항목을 포함한다. 제2 분류자 후보는 제1 분류자 후보보다는, 컨텐츠에 출현하는 횟수가 광고성 컨텐츠 및 비광고성 컨텐츠 사이에서 현격하게 상이하지 않을 수 있으나 이에 한정되는 것은 아니다.The second classifier candidate includes an item different from the first classifier candidate. The second classifier candidate may not be significantly different from the first classifier candidate in terms of the number of appearances in the content between the advertisement content and the non-advertisement content, but the present invention is not limited thereto.

제2 분류자 후보가 필요한 이유는 다음과 같으나 이는 단순한 예에 불과하다. 제2 분류자 후보는, 컨텐츠의 광고성을 판단함에 있어서 제1 분류자 후보만으로는 판단이 어려운 경우 또는 제1 분류자 후보 이외의 조건으로 컨텐츠의 광고성을 판단해야 하는 경우를 위한 것일 수 있다.The reason why the second classifier candidate is necessary is as follows, but this is merely an example. The second classifier candidate may be for the case where it is difficult to judge only the first classifier candidate in judging the advertisement of the contents or the case where the advertisement of the contents should be judged on conditions other than the first classifier candidate.

이러한 제2 분류자 후보는 컨텐츠의 구성에 대한 특징을 포함할 수 있다. 컨텐츠의 구성에 대한 특징이란 컨텐츠에 나타나는 특정 단어나 표현, 형식들에 관한 것들일 수 있다. 도 3은 일 실시예에 따른 컨텐츠의 구성에 대한 특징인 제2 분류자 후보를 제1 분류자 후보와 함께 도시한 도면이다. 도 3을 참조하면, 변수 중에서 상위 4개는 제1 분류자 후보인 반면, 상위 4개를 제외한 나머지 변수들은 제2 분류자 후보이다. 도 3에 도시된 제2 분류자 후보는 항목으로서 예를 들면 컨텐츠 본문의 단어의 수, 컨텐츠의 작성 요일, 본문의 길이 또는 지도의 유무 등을 포함할 수 있다.Such a second classifier candidate may include characteristics of the composition of the contents. The feature of composition of contents may be related to specific words, expressions, and forms appearing in the contents. 3 is a diagram illustrating a second classifier candidate, which is a characteristic of the configuration of contents according to an embodiment, together with a first classifier candidate. Referring to FIG. 3, the upper four of the variables are the first classifier candidates, while the remaining variables except for the upper four are the second classifier candidates. The second classifier candidate shown in Fig. 3 may include, for example, the number of words in the content body, the creation day of the content, the length of the text, or the presence or absence of a map.

제2 분류자 후보는 컨텐츠의 구성에 대한 특징 이외에, 컨텐츠에 출현하는 단어가 갖고 있는 감성이 긍정적인지, 부정적인지 또는 중립적인지를 포함할 수 있다. 컨텐츠 작성자는 컨텐츠 작성시 컨텐츠가 다루는 대상에 대한 자신의 긍정적/부정적/중립적인 견해를 자신이 사용한 단어를 이용하여 표현할 수 있다. 단어가 갖고 있는 감성이란 예컨데, 칭찬을 나타내는 단어 또는 불평을 나타내는 단어를 의미할 수 있다.The second classifier candidate may include, in addition to the features of the composition of the content, whether the emotion of words appearing in the content is positive, negative or neutral. The content creator can express his / her positive / negative / neutral view of the content covered by the content during the creation of the content by using the words he / she uses. The sensibility possessed by a word may mean, for example, a word indicating praise or a word indicating a complaint.

여기서, 단어가 긍정적/부정적/중립적인지 여부는 기존의 API 등을 이용하면 얻어지는데, 이는 이미 공지된 기술이므로 이에 대한 자세한 설명은 생략하기로 한다.Here, whether the word is positive / negative / neutral can be obtained by using an existing API or the like, which is a known technique, and thus a detailed description thereof will be omitted.

감성 정도에 대해서는, 도 4가 이를 예시적으로 도시하고 있다. 도 4는 일 실시예에 따른 제2 분류자 후보의 감성 정도를 나타내는 특징에 대하여 도시한 도면이다. 도 4를 참조하면, POScore는 컨텐츠의 내용이 전반적으로 긍정적인지를 판단하는 특징으로, 컨텐츠에 포함된 단어들의 감성을 평가하여 긍정적으로 판명된 단어들이 부정적으로 판명된 단어들보다 많은 경우 1, 그렇지 않으면 0이 부여된다. NAScore는 POScore와 반대로 점수를 부여한다.Regarding the degree of sensitivity, FIG. 4 exemplarily shows this. FIG. 4 is a diagram showing a feature indicating the degree of emotion of a second classifier candidate according to an embodiment. Referring to FIG. 4, POScore is a feature that judges whether contents of the contents are generally positive. When POSCore evaluates emotions of words included in the content, if positive words are more than negative words, 0 is assigned. NAScore scores points as opposed to POScore.

CNOScore, CPOScore, CNAScore는 각각 중립, 긍정 및 부정 단어들의 빈도수를 나타내는 특징이며, 이들은 컨텐츠에서 해당 단어들의 출현 횟수에 따라 정량적인 값을 가질 수 있다.CNOScore, CPOScore, and CNAScore are features that represent the frequency of neutral, positive, and negative words, respectively, and they can have quantitative values depending on the number of occurrences of the words in the content.

다시 도 1을 참조하면, 단계 S110에서는 제1 분류자 후보 및 제2 분류자 후보 각각의 관련도를 산출한다. 관련도란 제1 분류자 후보 및 제2 분류자 후보 각각이 컨텐츠의 광고성을 판단하는 것과 관련이 있는지를 나타내는 성질이다. 즉, 관련도가 높다면 컨텐츠의 광고성을 판단하는 것과 관련이 있는 반면, 관련도가 낮다면 관련이 낮다고 판단될 수 있다.Referring again to FIG. 1, in step S110, the degrees of relevance of the first classifier candidate and the second classifier candidate are calculated. The relevance is a property indicating whether each of the first classifier candidate and the second classifier candidate is related to judging the advertisement of contents. In other words, if the relevance is high, it is related to judging the advertisement of the content, but if the relevance is low, it can be judged that the relevance is low.

관련도를 산출하는 방법으로, 예를 들면 제1 분류자 후보 및 제2 분류자 후보 각각이 샘플 컨텐츠 중에서 광고성으로 분류된 광고성 컨텐츠와 비광고성으로 분류된 비광고성 컨텐츠에 각각 포함되는 정도, 즉 횟수를 기준으로 산출하는 방법이 있다. 이 경우, 포함되는 횟수가 상대적으로 높을수록 관련도는 높은 반면, 포함되는 횟수가 상대적으로 낮을수록 관련도는 낮게 된다.For example, the degree to which each of the first classifier candidate and the second classifier candidate is included in the advertisement content classified as the advertisement content and the non-advertisement content classified as the non-advertisement in the sample contents, that is, As a reference. In this case, the higher the frequency of inclusion is, the higher the degree of relevance is, while the degree of relevance is lower as the frequency of inclusion is relatively lower.

관련도를 산출하는 또 다른 방법으로, 예를 들면 제1 분류자 후보 및 제2 분류자 후보 각각에 대하여, 샘플 컨텐츠 중에서 광고성으로 분류된 광고성 컨텐츠 및 비광고성으로 분류된 비광고성 컨텐츠에 대한 상관분석을 수행하고, 이러한 상관분석 결과를 기초로 관련도를 산출할 수 있다. 상관분석 방법으로는 예컨데 피어슨 상관계수(Pearson Correlation Coefficient)를 이용할 수 있으며, 이 경우 비광고성 컨텐츠에는 0, 광고성 컨텐츠에는 1의 값을 부여할 수 있다.As another method for calculating the relevance, for example, for each of the first classifier candidate and the second classifier candidate, correlation analysis between the advertisement content classified as the advertisement content and the non-advertisement content classified as the non- And the degree of relevance can be calculated based on the result of the correlation analysis. As a correlation analysis method, for example, a Pearson correlation coefficient may be used. In this case, a value of 0 for non-advertisement contents and a value of 1 for advertisement contents can be given.

여기서, 상관분석은 제1 분류자 후보 및 제2 분류자 후보가 컨텐츠의 광고성을 판단하는 것과 얼마나 상관관계가 있는지를 도출하기 위하여 수행하는 것이다. 상관정도는 다음과 같이 상관계수(r)의 값에 따라 결정될 수 있다.Here, correlation analysis is performed to derive how correlations between the first classifier candidate and the second classifier candidate are in determining the advertisement of the content. The degree of correlation can be determined according to the value of the correlation coefficient r as follows.

- r이 -1.0과 -0.7 사이이면, 강한 음의 상관관계(SN)If - r is between -1.0 and -0.7, a strong negative correlation (SN)

- r이 -0.7과 -0.3 사이이면, 뚜렷한 음의 상관관계(CN)If - r is between -0.7 and -0.3, a distinct negative correlation (CN)

- r이 -0.3과 -0.1 사이이면, 약한 음의 상관관계(WN)If - r is between -0.3 and -0.1, a weak negative correlation (WN)

- r이 -0.1과 +0.1 사이이면, 거의 무시될 수 있는 상관관계(IGN)- If r is between -0.1 and +0.1, a negligible correlation (IGN)

- r이 +0.1과 +0.3 사이이면, 약한 양의 상관관계(WP)If - r is between +0.1 and +0.3, a weak positive correlation (WP)

- r이 +0.3과 +0.7 사이이면, 뚜렷한 양의 상관관계(CP)If - r is between +0.3 and +0.7, a significant positive correlation (CP)

- r이 +0.7과 +1.0 사이이면, 강한 양의 상관관계(SP)If - r is between +0.7 and +1.0, then a strong positive correlation (SP)

다만, 상관분석에 대한 것은 이미 공지된 기술이므로 이에 관한 자세한 설명은 생략하기로 한다.However, since correlation analysis is a known technique, detailed description thereof will be omitted.

도 5는 일 실시예에 따른 제1 분류자 후보 및 제2 분류자 후보에 대하여 상관분석을 실시한 결과를 나타낸 값이다. 도 5를 참조하면, 상단의 4개의 변수는 제1 분류자 후보이며 이들은 상대적으로 높은 상관계수를 갖는다. 상단의 4개의 변수를 제외한 나머지 변수 중에서는 전화번호(phone)의 언급, 문서길이(content_length), 지도 포함 여부(map), CNOScore와 CPOScore가 뚜렷한 상관관계를 갖는 것으로 나타난다. 다만, 도 5에 도시된 것은 예시적인 것에 불과하다.5 is a graph showing a result of performing a correlation analysis on the first classifier candidate and the second classifier candidate according to the embodiment. Referring to FIG. 5, the four variables at the top are the first classifier candidates and they have a relatively high correlation coefficient. Among the remaining variables except for the top four variables, there is a clear correlation between phone number, content length, map inclusion, CNOScore and CPOScore. However, what is shown in Fig. 5 is merely an example.

다시 도 1을 참조하면, 단계 S110에서 산출된 관련도를 기초로 제1 분류자 후보 및 제2 분류자 후보 중 적어도 하나 이상을 컨텐츠의 광고성을 판단하는 분류자로 선정하는 단계(S120)가 수행된다.Referring back to FIG. 1, a step S120 of selecting at least one of a first classifier candidate and a second classifier candidate as a classifier for determining the advertisement of the content is performed based on the degree of relevance calculated in step S110 .

단계 S120에서 분류자 후보들 중 적어도 하나 이상을 분류자로 선정함에 있어서, 관련도가 상대적으로 높은 변수들을 선정하거나 이들을 조합하여 선정할 수 있다. 예컨데, 도 6에 도시된 것과 같이, 도 5에 도시된 모든 변수들을 조합하는 경우(ALL), 도 5의 상단 4개의 변수들인 제1 분류자 후보들만을 조합하는 경우(Basis) 등이 있을 수 있다.In step S120, when selecting at least one of the classifier candidates as a classifier, variables having relatively high relevance may be selected or a combination thereof may be selected. For example, as shown in FIG. 6, if all the parameters shown in FIG. 5 are combined (ALL), there may be a case where only the first classifier candidates (Basis) .

다음으로, 단계 S120에서 선정된 분류자를 기초로 적어도 하나 이상의 컨텐츠를 광고성 또는 비광고성 컨텐츠로 분류하는 단계(S130)가 수행될 수 있다. 분류하는 방식으로는 이미 알려진 나이브 베이즈 분류 또는 신경만 분류가 적용될 수 있는데, 이들 분류를 이용하여 분류하는 방법은 이미 공지된 기술이므로 이에 관한 자세한 설명은 생략하기로 한다.Next, in step S120, classification of at least one content into advertisement or non-advertisement content based on the selected classifier (S130) may be performed. The classification method can be applied to the Naive Bayes classification or the neuron bay classification, which are already known, so a detailed description thereof will be omitted.

도 7은 일 실시예에 따른 컨텐츠 분류 방법을 나이브 베이즈 분류 또는 신경망 분류를 이용하여 분류하였을 때의 분류의 성능, 즉 정확도를 도시한 도면이다. 도 7을 참조하면, 신경망 분류는 전반적으로 나이브 베이즈 분류에 비해 상대적으로 좋은 성능을 갖는다.FIG. 7 is a diagram illustrating the performance, that is, the accuracy, of the classification when the content classification method according to an embodiment is classified using the Naïve Bayes classification or the neural network classification. Referring to FIG. 7, the neural network classification has a relatively good performance as compared with the Naïve Bayes classification as a whole.

이상에서 살펴본 바와 같이, 일 실시예에 따르면, 블로그와 같은 인터넷 상의 컨텐츠가 광고적인 목적으로 작성되었는지 여부를 분류하기 위한 기준인 분류자를 전술한 절차에 따라서 선정할 수 있으며, 이와 같이 선정된 분류자를 기초로 컨텐츠가 광고성 컨텐츠인지 아니면 비광고성 컨텐츠인지를 파악할 수 있다.As described above, according to one embodiment, a classifier, which is a criterion for classifying whether or not contents on the Internet such as a blog is created for advertising purposes, can be selected according to the procedure described above. Based on whether the content is an advertisement content or a non-advertisement content.

본 발명에 첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방식으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 블록도의 각 블록 또는 흐름도 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록 및 흐름도의 각 단계에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.Combinations of each step of the flowchart and each block of the block diagrams appended to the present invention may be performed by computer program instructions. These computer program instructions may be loaded into a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus so that the instructions, which may be executed by a processor of a computer or other programmable data processing apparatus, And means for performing the functions described in each step are created. These computer program instructions may also be stored in a computer usable or computer readable memory capable of directing a computer or other programmable data processing apparatus to implement the functionality in a particular manner so that the computer usable or computer readable memory It is also possible for the instructions stored in the block diagram to produce a manufacturing item containing instruction means for performing the functions described in each block or flowchart of the block diagram. Computer program instructions may also be stored on a computer or other programmable data processing equipment so that a series of operating steps may be performed on a computer or other programmable data processing equipment to create a computer- It is also possible that the instructions that perform the processing equipment provide the steps for executing the functions described in each block of the block diagram and at each step of the flowchart.

또한, 각 블록 또는 각 단계는 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실시예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들 또는 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.Also, each block or each step may represent a module, segment, or portion of code that includes one or more executable instructions for executing the specified logical function (s). It should also be noted that in some alternative embodiments, the functions mentioned in the blocks or steps may occur out of order. For example, two blocks or steps shown in succession may in fact be performed substantially concurrently, or the blocks or steps may sometimes be performed in reverse order according to the corresponding function.

이상의 설명은 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 기술 사상의 범위가 한정되는 것은 아니다. 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술사상은 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea, and various modifications and changes may be made without departing from the essential characteristics of a person skilled in the art to which the present invention belongs. Therefore, the embodiments disclosed in the present invention are intended to be illustrative rather than limiting, and the scope of technical thought is not limited by these embodiments. The scope of protection is to be interpreted by the following claims, and all technical ideas within the scope of equivalents should be interpreted as being included in the scope of the right.

Claims

Translated fromKorean

컨텐츠의 광고성을 판단하는 것으로 지정된 적어도 하나 이상의 제1 분류자 후보를 기초로 복수의 샘플 컨텐츠를 광고성 또는 비광고성으로 분류하는 단계와,
상기 샘플 컨텐츠의 광고성을 판단하는 것과 관련이 있는지를 나타내는 관련도를, 상기 제1 분류자 후보 및 상기 제1 분류자 후보와는 상이한 적어도 하나 이상의 제2 분류자 후보 각각에 대해 산출하는 단계와,
상기 산출된 관련도를 기초로 상기 제1 분류자 후보 및 상기 제2 분류자 후보 중 적어도 하나 이상을 상기 컨텐츠의 광고성을 판단하는 분류자로 선정하는 단계와,
상기 선정된 분류자를 기초로 상기 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류하는 단계를 포함하는
컨텐츠 분류 방법.Classifying a plurality of sample contents into advertisements or non-advertisements based on at least one or more first classifier candidates determined to judge the advertisement of contents;
Calculating for each of the at least one second classifier candidate different from the first classifier candidate and the first classifier candidate an association degree indicating whether the sample content is related to judging the advertisement of the sample content;
Selecting at least one of the first classifier candidate and the second classifier candidate as a classifier for determining the advertisement of the content based on the calculated relevance;
And classifying the content into advertisement content or non-advertisement content based on the selected classifier
Content classification method.

제 1 항에 있어서,
상기 적어도 하나 이상의 제1 분류자 후보를 기초로 복수의 샘플 컨텐츠를 광고성 또는 비광고성으로 분류하는 단계는,
상기 제1 분류자 후보가 상기 복수의 샘플 컨텐츠 각각에 포함되는 횟수를 기초로 분류하는
컨텐츠 분류 방법.The method according to claim 1,
Classifying the plurality of sample contents into advertisement or non-advertisement based on the at least one first classifier candidate,
Classifying the first classifier candidate based on the number of times that the first classifier candidate is included in each of the plurality of sample contents
Content classification method.

제 1 항에 있어서,
상기 샘플 컨텐츠가 음식점을 평가한 내용을 포함하는 경우,
상기 제1 분류자 후보는,
단어 '맛집'이 상기 샘플 컨텐츠의 제목 또는 본문에 언급된 횟수, 상기 음식점의 상호명이 상기 샘플 컨텐츠에서 언급된 횟수 및 상기 음식점의 주소가 상기 샘플 컨텐츠에서 언급되었는지 여부 중 적어도 하나 이상을 포함하는
컨텐츠 분류 방법.The method according to claim 1,
If the sample content includes the evaluation of the restaurant,
Wherein the first classifier candidate comprises:
The number of times the word " restaurant " is mentioned in the title or body of the sample content, the number of times the business name of the restaurant is mentioned in the sample content, and whether the address of the restaurant is mentioned in the sample content
Content classification method.

제 1 항에 있어서,
상기 산출하는 단계는,
상기 제1 분류자 후보 및 상기 제2 분류자 후보 각각이 상기 광고성으로 분류된 광고성 컨텐츠와 상기 비광고성으로 분류된 비광고성 컨텐츠에 각각 포함되는 정도를 판단하는 것을 기초로 상기 관련도를 산출하는
컨텐츠 분류 방법.The method according to claim 1,
Wherein the calculating step comprises:
The degree of association is calculated on the basis of determining the extent to which each of the first classifier candidate and the second classifier candidate is included in the advertisement content classified into the advertisement and the non-advertisement content classified into the non-advertisement,
Content classification method.

제 1 항에 있어서,
상기 산출하는 단계는,
상기 제1 분류자 후보 및 상기 제2 분류자 후보 각각에 대하여, 상기 광고성 컨텐츠 및 상기 비광고성 컨텐츠에 대한 상관분석을 수행하는 단계와,
상기 수행된 상관분석을 기초로 상기 관련도를 산출하는 단계를 포함하는
컨텐츠 분류 방법.The method according to claim 1,
Wherein the calculating step comprises:
Performing correlation analysis on the advertisement content and the non-advertisement content with respect to each of the first classifier candidate and the second classifier candidate;
And calculating the relevance based on the performed correlation analysis
Content classification method.

제 1 항에 있어서,
상기 제2 분류자 후보는,
상기 컨텐츠에 포함된 단어의 긍정적 성향의 정도 또는 부정적 성향의 정도를 포함하는
컨텐츠 분류 방법.The method according to claim 1,
Wherein the second classifier candidate comprises:
The degree of the positive inclination or the degree of the negative inclination of the words included in the contents
Content classification method.

컨텐츠의 광고성을 판단하는 것으로 지정된 적어도 하나 이상의 제1 분류자 후보를 기초로 복수의 샘플 컨텐츠를 광고성 또는 비광고성으로 분류하는 단계와,
상기 샘플 컨텐츠의 광고성을 판단하는 것과 관련이 있는지를 나타내는 관련도를, 상기 제1 분류자 후보 및 상기 제1 분류자 후보와는 상이한 적어도 하나 이상의 제2 분류자 후보 각각에 대해 산출하는 단계와,
상기 산출된 관련도를 기초로 상기 제1 분류자 후보 및 상기 제2 분류자 후보 중 적어도 하나 이상을 상기 컨텐츠의 광고성을 판단하는 분류자로 선정하는 단계와,
상기 선정된 분류자를 기초로 적어도 하나 이상의 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류하는 단계를 수행하도록 프로그램된
컴퓨터 프로그램이 저장된 판독가능한 기록 매체.Classifying a plurality of sample contents into advertisements or non-advertisements based on at least one or more first classifier candidates determined to judge the advertisement of contents;
Calculating for each of the at least one second classifier candidate different from the first classifier candidate and the first classifier candidate an association degree indicating whether the sample content is related to judging the advertisement of the sample content;
Selecting at least one of the first classifier candidate and the second classifier candidate as a classifier for determining the advertisement of the content based on the calculated relevance;
And classifying at least one content into advertising content or non-advertising content based on the selected classifier
Readable recording medium in which a computer program is stored.

컨텐츠의 광고성을 판단하는 것으로 지정된 적어도 하나 이상의 제1 분류자 후보를 기초로 복수의 샘플 컨텐츠를 광고성 또는 비광고성으로 분류하는 단계와,
상기 샘플 컨텐츠의 광고성을 판단하는 것과 관련이 있는지를 나타내는 관련도를, 상기 제1 분류자 후보 및 상기 제1 분류자 후보와는 상이한 적어도 하나 이상의 제2 분류자 후보 각각에 대해 산출하는 단계와,
상기 산출된 관련도를 기초로 상기 제1 분류자 후보 및 상기 제2 분류자 후보 중 적어도 하나 이상을 상기 컨텐츠의 광고성을 판단하는 분류자로 선정하는 단계와,
상기 선정된 분류자를 기초로 적어도 하나 이상의 컨텐츠를 광고성 컨텐츠 또는 비광고성 컨텐츠로 분류하는 단계를 수행하도록 프로그램된
컴퓨터 판독가능 기록매체에 저장된 컴퓨터 프로그램.Classifying a plurality of sample contents into advertisements or non-advertisements based on at least one or more first classifier candidates determined to judge the advertisement of contents;
Calculating for each of the at least one second classifier candidate different from the first classifier candidate and the first classifier candidate an association degree indicating whether the sample content is related to judging the advertisement of the sample content;
Selecting at least one of the first classifier candidate and the second classifier candidate as a classifier for determining the advertisement of the content based on the calculated relevance;
And classifying at least one content into advertising content or non-advertising content based on the selected classifier
A computer program stored on a computer readable recording medium.