This paper investigates how demand-side factors contribute to the Internet’s “Long Tail” phenomenon. It first models how a reduction in search costs will affect the concentration in product sales. Then, by analyzing data collected from a multi-channel retailing company, it provides empirical evidence that the Internet channel exhibits a significantly less concentrated sales distribution, when compared with traditional channels. The difference in the sales distribution is highly significant, even after controlling for consumer differences. Furthermore, the effect is particularly strong for individuals with more prior experience using the Internet channel. We find evidence that Internet purchases made by consumers with prior Internet experience are more skewed toward obscure products, compared with consumers who have no such experience. We observe the opposite outcome when comparing purchases by the same consumers through the catalog channel. If the relationships we uncover persist, the underlying trends in technology and search costs portend an ongoing shift in the distribution of product sales. Singular Value Decomposition (SVD), together with the Expectation-Maximization (EM) procedure, can be used to find a low-dimension model that maximizes the log likelihood of observed ratings in recommendation systems. However, the computational cost of this approach is a major concern, since each iteration of the EM algorithm requires a new SVD computation. We present a novel algorithm that incorporates SVD approximation into the EM procedure to reduce the overall computational cost while maintaining accurate predictions. Furthermore, we propose a new framework for collaborating filtering in distributed recommendation systems that allows users to maintain their own rating profiles for privacy. We conduct offline and online tests of our ranking algorithm. We use Yahoo! Search queries that resulted in a click on a Yahoo! Movies or Internet Movie Database (IMDB) movie URL. Our online test involved 44 Yahoo! Employees providing subjective assessments of results quality. In both tests, our ranking methods show significantly better recall and quality than IMDB search and Yahoo! Movies current search. Reduced rank approximation of matrices has hitherto been possible only by un-weighted least squares. This paper presents iterative techniques for obtaining such approximations when weights are introduced.
ABSTRACT:
This paper investigates how demand-side factors contribute to the Internet’s “Long Tail” phenomenon. It first models how a reduction in search costs will affect the concentration in product sales. Then, by analyzing data collected from a multi-channel retailing company, it provides empirical evidence that the Internet channel exhibits a significantly less concentrated sales distribution, when compared with traditional channels. The difference in the sales distribution is highly significant, even after controlling for consumer differences. Furthermore, the effect is particularly strong for individuals with more prior experience using the Internet channel. We find evidence that Internet purchases made by consumers with prior Internet experience are more skewed toward obscure products, compared with consumers who have no such experience. We observe the opposite outcome when comparing purchases by the same consumers through the catalog channel. If the relationships we uncover persist, the underlying trends in technology and search costs portend an ongoing shift in the distribution of product sales.Singular Value Decomposition (SVD), together with the Expectation-Maximization (EM) procedure, can be used to find a low-dimension model that maximizes the loglikelihood of observed ratings in recommendation systems. However, the computational cost of this approach is a major concern, since each iteration of the EM algorithm requires a new SVD computation. We present a novel algorithm that incorporates SVD approximation into the EM procedure to reduce the overall computational cost while maintaining accurate predictions. Furthermore, we propose a new framework for collaborating filtering in distributed recommendation systems that allows users to maintain their own rating profiles for privacy. We conduct o_ine and online tests of our ranking algorithm. For o_ine testing, we use Yahoo! Search queries that resulted in a click on a Yahoo! Movies or Internet Movie Database (IMDB) movie URL. Our online test involved 44 Yahoo! Employees providing subjective assessments of results quality. In both tests, our ranking methods show signi_cantly better recall and quality than IMDB search and Yahoo! Movies current search. Reduced rank approximation of matrices has hitherto been possible only by unweighted least squares. This paper presents iterative techniques for obtaining such approximations when weights are introduced. The techniques involve criss-cross regressions with careful initialization. Possible applications of the approximation are in modeling, biplotting, contingency table analysis, fitting of missing values, checking outliers, etc.
1. Introduction
Collaborative Filtering analyzes a user preferences database to predict additional products or services in which a user might be interested. The goal is to predict the preferences of a user based on the preferences of others with similar tastes. There are two general classes of collaborative filtering algorithms. Low-dimension linear models are a popular means to describe
user preferences. The following are representative state-of-the-art collaborative filtering algorithms.Directly assume that the user preferences database is generated from a linear model, matrix factorization based collaborative filtering methods obtain an explicitlinear model to approximate the original user preferences matrix, and use the Pearson correlation coefficient,
which is equivalent to a linear fit. If we assume that users’ ratings are generated from a low-dimension linear model together with Gaussiandistributed noise, the Singular Value Decomposition (SVD) technique can be used to find the linear model that maximizes the log-likelihood of the rating matrix, assuming it is complete. If the rating matrix is incomplete, as is the case in real-world systems, SVD cannot be applied directly. The Expectation-Maximization (EM) procedure can be used to find the model that maximizes the log-likelihood of the available ratings, but this requires a SVD computation of the whole matrix for each EM iteration. As the size of the rating matrix is usually huge (due to large numbers of users and items in typical recommendation systems), the computational cost of SVD becomes an important concern. Deterministic SVD methods for computing all singular vectors on anm-by-n matrix take O(mn2+m2n) time, and the Lanczos method requires roughly O(kmn log(mn)) time to approximate the top k singular vectors [10]. In this work, we present a novel algorithm based on using an SVD approximation. There are several challenges for adapting item authority in these information retrieval systems due to the di_erent characteristics of documents like item or product informa-tion documents in commercial sites, as compared to web documents. The power of PageRank and HITS stems from the feature of links between web documents. PageRank and
HITS assume that a link from document i to j represents a recommendation or endorsement of document j by the owner of document i. However, in item information pages in commercial sites, links often represent di_erent kinds of relationships other than recommendation. For example, two items may be linked because both items are produced by the same company. Also, since these item information pages are generally created by providers rather than users or cus-tomers, the documents may contain the providers' perspec-tive on the items rather than those of users or customers. On the other hand, recommender systems are widely used in ecommerce sites to overcome information overload. Note that information retrieval systems work somewhat passively while recommender systems look for the need of a user more actively.
2. RELATEDWORK
Recommender systems can be built in three ways:
content-based _ltering, collaborative _ltering, and hybrid systems.Content-based recommender systems, sometimes called in-formation _ltering systems, use behavioral user data for a single user in order to try to infer the types of item attributes that the user is interested in. Collaborative _ltering compares one user's behavior against a database of other users' behaviors in order to identify items that like-minded users are interested in. Even though content-based recommender systems are e_cient in _ltering out unwanted information and generating recommendations for a user from massive information, they _nd few if any coincidental discoveries. On the other hand, collaborative _ltering systems enables serendipitous discoveries by using historical user data.
Collaborative _ltering algorithms range from simple nearest neighbor methods to more complex machine learning based methods such as graph based methods linear algebra based methods and probabilistic methods. A few variations of lterbot based algorithms and hybrid methods that combine content and a collaborative ltering have also been proposed to attack the so-called cold start problem.Tapestry is one of the earliest recommender systems.In this system, each user records their opinions (annota-tions) of documents they read, and these annotations are accessed by others' _lters. GroupLens4, Ringo and Video Recommender are the earliest fully automatic recommender systems, which provide recommendations of news, music, and movies. PHOAKS (People Helping One Another Know Stu_) crawls web messages and extracts recommendations from them rather than using users' explicit ratings. GroupLens has also developed a movie recommender system called MovieLens5. Fab is the _rst hybrid recommender system, which use a combination of content-based and collaborative _ltering techniques for web recommendations. Tango provides online news recommendations and Jester provides recommendations of jokes. A new concept of document relevance, often called document authority, and developed the PageRank and HITS algorithms, respectively, for better precision in web search. Both algorithms analyze the link structure of the Web to calculate document authorities. Haveliwala proposed topic sensitive PageRank, which generates multiple document authorities biased to each speci_c topic for better document ranking.Note that our approach is di_erent from general web search engines since we use user ratings rather than link structure for generating item authorities. Also, our approach is di_erent from topic-sensitive PageRank since we provide personalized item authorities for each user rather than topic-biased item authorities. Also, our approach is di_erent from recommender systems since it uses predictions of items as a ranking function for information search rather than generating recommendation.
3. SVD Approximation in Centralized Recommendation Systems
In this section, we discuss how to use SVD approximation to reduce the computational cost of the SVD based collaborative filtering in traditional centralized recommendation systems where the server keeps all users’ rating profiles. If the server uses the EM procedure shown in Section
Frequently asked questions
What is the main topic of the text?
The text primarily discusses collaborative filtering and recommendation systems, focusing on techniques to improve their efficiency and accuracy. A central theme is the application of Singular Value Decomposition (SVD) and its approximations to reduce the computational cost associated with these systems.
What is the "Long Tail" phenomenon, and how does it relate to the Internet, according to the abstract?
According to the abstract, the "Long Tail" phenomenon refers to the less concentrated sales distribution observed in the Internet channel compared to traditional channels. The paper investigates how demand-side factors, particularly reduced search costs, contribute to this phenomenon. The analysis suggests that Internet users, especially those with prior experience, are more likely to purchase obscure products.
What is Singular Value Decomposition (SVD), and why is it important in recommendation systems?
Singular Value Decomposition (SVD) is a technique used to find a low-dimension model that maximizes the log-likelihood of observed ratings in recommendation systems. It's important because it helps predict user preferences based on patterns in a large user ratings database. However, the computational cost of SVD, especially when combined with the Expectation-Maximization (EM) procedure, can be a major bottleneck.
What is the proposed solution to the computational cost of SVD in recommendation systems?
The text proposes a novel algorithm that incorporates SVD approximation into the EM procedure. This aims to reduce the overall computational cost while maintaining accurate predictions. This involves sampling rows of a matrix and creating a new matrix whose singular vectors approximate the original matrix.
What are the differences between content-based filtering and collaborative filtering?
Content-based filtering infers user interests based on their own past behavior. Collaborative filtering compares a user's behavior to the behavior of other users with similar tastes to identify items they might be interested in.
What is the role of "item authority" in information retrieval systems discussed in this document?
The text mentions challenges in adapting item authority in information retrieval systems due to the difference between item/product information documents in commercial sites versus regular web documents. Traditional link-based authority measures like PageRank and HITS are not always applicable because links between items may represent different relationships beyond recommendations, and provider-created content may not reflect user perspectives.
What are some of the recommender systems mentioned in the text?
The text mentions several recommender systems, including Tapestry, GroupLens, Ringo, Video Recommender, PHOAKS, MovieLens, Fab, Tango, and Jester.
How is the proposed approach different from general web search engines and topic-sensitive PageRank?
The approach differs from general web search engines by using user ratings rather than link structure for generating item authorities. It also differs from topic-sensitive PageRank by providing personalized item authorities for each user, rather than topic-biased item authorities.
How does the SVD approximation technique work and what does it accomplish?
The SVD approximation involves sampling and scaling rows from a matrix to create a smaller matrix. The top right singular vectors of this smaller matrix then approximate the top right singular vectors of the original matrix, significantly reducing the computational burden of SVD while still providing a reasonable approximation for use in recommendation systems.
- Quote paper
- Saravana Kumar (Author), Naveen Kumar (Author), 2013, Optimized Ranking-Based Techniques for Improving Aggregate Recommendation Diversity, Munich, GRIN Verlag, https://www.hausarbeiten.de/document/266178