xlvector's solution of NetflixPrize

2009年5月4日星期一

今天实现了RBM算法

感谢王元涛同学的帮助，今天实现了RBM模型，用50个hidden unit得到Probe上的RMSE=0.924。将这个结果融合进以前的预测器集合中，得到Quiz上的RMSE从0.8694改进到0.8688。

看了RBM确实是一大类方法，能够弥补其他方法的不足。目前我在实验更多hidden unit的RBM，已经用Gauss hidden unit 的RBM。

2009年4月21日星期二

2009年4月19日星期日

2009年4月14日星期二

程序并行化 multi-thread

因为我用的计算机有4个核，为了充分利用计算机资源，今天将基于SVD的模型重新用多线程进行实现，速度大大提升。 SVD很容易并行化，我的策略是在扫描数据的循环上并行化处理，在修改模型时用锁进行互斥。

2009年4月10日星期五

clustering items and users by latent factors?

By SVD model, we can calculate latent factors for users and items. p(u) is latent factor for u, while q(i) is latent factor for i.

Recently, I thought about calculate user similarity by latent factor,

For example,

s(u,v) = f(p(u), p(v)) ?

I am testing this idea now, and I hope this idea can improve prediction accuracy.

PS.
I have tested this method on the residual of NSVD model. By using this clustering method in estimating group effects, I reduce the RMSE of NSVD model from 0.8923 to 0.8910

2009年4月8日星期三

An improved item-based KNN predictor

Today, I revise the item-based KNN predictor and get RMSE = 0.8730 in quiz with other 39 predictors.

The classical item-based kNN will firstly calculate similarity between item i and j by：

$s_{ij} = \frac{\sum_u (r_{ui} - b_{ui})(r_{uj} - b_{uj})}{\sqrt{\sum_u (r_{ui} - b_{ui})^2 \sum_u (r_{uj} - b_{uj})^2}}$

The, the rating r(u,i) will be predicted by:

$r_{ui} = b_{ui} + \frac{\sum_j s_{ij}(r_{uj} - b_{uj})}{\sum_j |s_{ij}|}$

However, I revise this predictor by:

$r_{ui} = b_{ui} + \frac{\sum_j s_{ij}(r_{uj} - b_{uj})}{\sum_j (\alpha + |s_{ij}|)}$

This predictor can produce more accurate prediction by choosing adequate alpha

2009年4月7日星期二

关于评分分布的思考

对于一部电影，它被一堆人评分了，这些评分具有一些属性，均值，方差，偏差，等等。
目前我在设计一个新的模型，对于每一部电影，我们计算他的平均得分，记为m(i)，那么对于一个用户-电影(u,i)，
我建立如下的模型来估计他的评分

r(u,i) = mu + bu + bi + dot(p[u], q[i]) * h[k]

其中，k = m(i)

这个模型可以不断的变换，比如我们也可以令k = var(i)，也就是说

dot(p[u], q[i]) * h[k]

表示了用户u对具有k属性的电影i的看法。我仍然用梯度法训练这个模型。结果稍后公布。

xlvector's solution of NetflixPrize

2009年5月4日星期一

今天实现了RBM算法

2009年4月21日星期二

推荐两篇关于RBM的中文介绍

2009年4月19日星期日

Top40

2009年4月14日星期二

程序并行化 multi-thread

2009年4月10日星期五

clustering items and users by latent factors?

2009年4月8日星期三

An improved item-based KNN predictor

2009年4月7日星期二

关于评分分布的思考

我的博客列表

Twitter Updates

我的简介

博客归档

标签

关注者

xlvector's solution of NetflixPrize

2009年5月4日星期一

今天实现了RBM算法

2009年4月21日星期二

推荐两篇关于RBM的中文介绍

2009年4月19日星期日

Top40

2009年4月14日星期二

程序并行化 multi-thread

2009年4月10日星期五

clustering items and users by latent factors?

2009年4月8日星期三

An improved item-based KNN predictor

2009年4月7日星期二

关于评分分布的思考

我的博客列表

Twitter Updates

我的简介

博客归档

标签

关注者

订阅