eGitty

Discover The Most Popular Algorithms

Creating Document Embeddings with Negative Sampling for Document-level Sentiment Classification

In order to implement document-level sentiment classification, each document must be mapped a dense, low dimensional vector in continuous vector space. In deep learning, we can use LSTM, CNN to model.

However, paper: Sentiment Classification using Document Embeddings trained with Cosine Similarity proposed a new method to create document vectors.

As to word2vec, we can use negative sampling to create word embeddings. In this paper, it also use this method to create document vectors.

The source code is here: https://github.com/tanthongtan/dv-cosine

How about the objective function?

In order to create document vector by training, we should build a objective function.

Creating Document Embeddings with Negative Sampling for Document-level Sentiment Classification

This objective function is a logistic regression.

The \(\sigma\) is the sigmoid function.

Understand Sigmoid Function: Properties and Derivative – Machine Learning Tutorial

There are also other objective functions:

L2R dot product

L2R Dot Product for Creating Document Embeddings with Negative Sampling for Document-level Sentiment Classification

How about document embeddings?

Here is the result:

The Result of Creating Document Embeddings with Negative Sampling for Document-level Sentiment Classification

Datasets used for Document-level Sentiment Analysis

IMDB dataset is used in this paper. This dataset contains 25,000 training documents, 25,000 test documents, and 50,000 unlabeled documents. The IMDB dataset is a binary sentiment classification dataset consisting of movie reviews retrieved from IMDB

Comparative Results

The comparative results are below:

The Experiment Reuslts on document-level sentiment classification on imdb using document embeddings

Leave a Reply

Your email address will not be published. Required fields are marked *