Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Newsgroup Text classification with Machine Learning
petercour
petercour

Posted on

     

Newsgroup Text classification with Machine Learning

Text can be automaticallyclassified. As anything with Machine Learning, it needs data. So what data are we going to us?

The Data

Lets say our source of data is the fetch_20newsgroups data set.
This data set contains the text of nearly 20,000 newsgroup posts partitioned across 20 different newsgroups.

The dataset is quite old, but that doesn't matter.You can find the original homepage here:20 news groups dataset

The data set included by default in the Python Machine Learning module sklearn.

To simplify, we'll only take 2 news groups "rec.motorcycles" and "rec.sport.hockey".

#!/usr/bin/python3news = fetch_20newsgroups(subset="all", categories=['rec.sport.hockey', 'rec.motorcycles'])
Enter fullscreen modeExit fullscreen mode

Test the Algorithm

Before using the classifier, you want to know how well it works. That is done by splitting the data set intotrain and test set.

#!/usr/bin/python3x_train, x_test, y_train, y_test = train_test_split(news.data,news.target)
Enter fullscreen modeExit fullscreen mode

The data we're dealing with is text. It needs to be vectors. Then use the TfidfVectorizer. So we have two vectors: x_train and x_test.

#!/usr/bin/python3transfer = TfidfVectorizer()x_train = transfer.fit_transform(x_train)x_test = transfer.transform(x_test)
Enter fullscreen modeExit fullscreen mode

No need to change y_train and y_test, as those are output labels (class 0 or class 1)

Create an algorithm object and train it with the data.

#!/usr/bin/python3estimator = MultinomialNB()estimator.fit(x_train,y_train)
Enter fullscreen modeExit fullscreen mode

Then you can make predictions and see how well it classifies on the test data

#!/usr/bin/python3y_predict = estimator.predict(x_test)print("y_predict:\n", y_predict)score = estimator.score(x_test, y_test)print("score:\n", score)
Enter fullscreen modeExit fullscreen mode

Run the program and you'll see the accuracy:

score: 0.9939879759519038
Enter fullscreen modeExit fullscreen mode

Make your own predictions

You can make predictions with new texts:

Enter some text: i like to drive motor cycle on the highwayy_predict:[0]Enter some text: i like to play hockey gamey_predict:[1]
Enter fullscreen modeExit fullscreen mode

To do so add these lines:

#!/usr/bin/python3sentence = input("Enter some text: ")sentence_x = transfer.transform([sentence])y_predict = estimator.predict(sentence_x)print("y_predict:\n", y_predict)
Enter fullscreen modeExit fullscreen mode

The program

The program below does it all

#!/usr/bin/python3from sklearn.datasets import fetch_20newsgroupsfrom sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.naive_bayes import MultinomialNBfrom sklearn.model_selection import train_test_splitdef nb_news():    news = fetch_20newsgroups(subset="all", categories=['rec.sport.hockey', 'rec.motorcycles'])    x_train, x_test, y_train, y_test = train_test_split(news.data,news.target)    transfer = TfidfVectorizer()    x_train = transfer.fit_transform(x_train)    x_test = transfer.transform(x_test)    estimator = MultinomialNB()    estimator.fit(x_train,y_train)    y_predict = estimator.predict(x_test)    print("y_predict:\n", y_predict)    score = estimator.score(x_test, y_test)    print("score:\n", score)    return Nonenb_news()
Enter fullscreen modeExit fullscreen mode

Related links:

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

  • Joined

More frompetercour

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp