Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Building a Basic Collaborative Filtering Recommender with Python

Tech May 18 14
import numpy as np


def compute_pair_similarity(vec_a, vec_b):
    shared_mask = (vec_a > 0) & (vec_b > 0)
    if shared_mask.sum() == 0:
        return 0.0
    a = vec_a * shared_mask
    b = vec_b * shared_mask
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))


def generate_suggestions(target_user, score_matrix, top_k=2):
    item_count = score_matrix.shape[1]
    sim_scores = np.zeros(item_count)

    user_profile = score_matrix[:, target_user]
    for idx in range(item_count):
        sim_scores[idx] = compute_pair_similarity(user_profile, score_matrix[:, idx])

    nearest_indices = np.argsort(sim_scores)[-top_k:]
    rated_sum = np.zeros(item_count)
    for idx in nearest_indices:
        rated_sum[idx] = score_matrix[:, idx].sum()

    return np.argsort(rated_sum)[::-1]


def run_demo():
    ratings = np.array([
        [5, 3, 0, 1],
        [4, 0, 4, 4],
        [1, 1, 3, 2],
        [0, 0, 4, 5],
        [2, 2, 0, 0]
    ])

    user_id = 0
    ordered_items = generate_suggestions(user_id, ratings, top_k=2)
    print(f"Item recommendation order for user {user_id}: {ordered_items}")


if __name__ == "__main__":
    run_demo()

How the Algorithm Works

The code implements a straightforward item-based collaborative filtering suggestion engine. It assumes that items liked by similar user clusters are relevant to a target user. The scoring mechanism relies on a rating matrix where rows represent items, columns represent users, and each cell holds a rating value (zero indicates no rating).

A core component is a similarity function that measures the affinity between two item vectors. Cosine similarity is applied only on positions where both items have received ratings, which helps avoid bias from missing data. If no common ratings exist, the similarity defaults to zero.

The suggestion pipeline is executed by generate_suggestions. It takes a target user identifier, the rating matrix, and a optional neighbor count top_k. The process follows these steps:

  • For every item column, compute its similarity with the target user’s rating profile using the masked cosine function.
  • Identify the k items most similar to the user’s taste.
  • Use those neighbors to estimate item scores by aggregating the total ratings each neighbor item has received.
  • Return the item indices sorted in descending order of their score, forming a ranked recommendation list.

This compact approach demonstrates the essential mechanism behind more sophisticated recommender frameworks while remaining easy to modify or extend.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.