Implementing Adagrad in Python

Ask Question

Asked 8 years, 1 month ago

Modified 8 years, 1 month ago

Viewed 906 times

I'm trying to implement Adagrad in Python. For learning purposes, I am using matrix factorisation as an example. I'd be using Autograd for computing the gradients.

My main question is if the implementation is fine.

Problem description

Given a matrix A (M x N) having some missing entries, decompose into W and H having sizes (M x k) and (k X N) respectively. Goal would to learn W and H using Adagrad. I'd be following this guide for the Autograd implementation.

NB: I very well know that ALS based implementation are well-suited. I'm using Adagrad only for learning purposes

Customary imports

import autograd.numpy as np
import pandas as pd

Creating the matrix to be decomposed

A = np.array([[3, 4, 5, 2],
                   [4, 4, 3, 3],
                   [5, 5, 4, 3]], dtype=np.float32).T

Masking one entry

A[0, 0] = np.NAN

Defining the cost function

def cost(W, H):
    pred = np.dot(W, H)
    mask = ~np.isnan(A)
    return np.sqrt(((pred - A)[mask].flatten() ** 2).mean(axis=None))

Decomposition params

rank = 2
learning_rate=0.01
n_steps = 10000

Gradient of cost wrt params W and H

from autograd import grad, multigrad
grad_cost= multigrad(cost, argnums=[0,1])

Main Adagrad routine (this needs to be checked)

shape = A.shape

# Initialising W and H
H =  np.abs(np.random.randn(rank, shape[1]))
W =  np.abs(np.random.randn(shape[0], rank))

# gt_w and gt_h contain accumulation of sum of gradients
gt_w = np.zeros_like(W)
gt_h = np.zeros_like(H)

# stability factor
eps = 1e-8
print "Iteration, Cost"
for i in range(n_steps):

    if i%1000==0:
        print "*"*20
        print i,",", cost(W, H)

    # computing grad. wrt W and H
    del_W, del_H = grad_cost(W, H)

    # Adding square of gradient
    gt_w+= np.square(del_W)
    gt_h+= np.square(del_H)

    # modified learning rate
    mod_learning_rate_W = np.divide(learning_rate, np.sqrt(gt_w+eps))
    mod_learning_rate_H = np.divide(learning_rate, np.sqrt(gt_h+eps))
    W =  W-del_W*mod_learning_rate_W
    H =  H-del_H*mod_learning_rate_H

While the problem converges and I get a reasonable solution, I was wondering if the implementation is correct. Specifically, if the understanding of sum of gradients and then computing the adaptive learning rate is correct or not?

asked Jun 9, 2017 at 14:53

Nipun Batra

1292 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Implementing Adagrad in Python

Problem description

Customary imports

Creating the matrix to be decomposed

Masking one entry

Defining the cost function

Decomposition params

Gradient of cost wrt params W and H

Main Adagrad routine (this needs to be checked)

0

Your Answer

Hot Network Questions

Implementing Adagrad in Python

Problem description

Customary imports

Creating the matrix to be decomposed

Masking one entry

Defining the cost function

Decomposition params

Gradient of cost wrt params W and H

Main Adagrad routine (this needs to be checked)

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions