Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Building a Basic Collaborative Filtering Recommender with Python

Tech May 18 2
import numpy as np


def compute_pair_similarity(vec_a, vec_b):
    shared_mask = (vec_a > 0) & (vec_b > 0)
    if shared_mask.sum() == 0:
        return 0.0
    a = vec_a * shared_mask
    b = vec_b * shared_mask
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))


def generate_suggestions(target_user, score_matrix, top_k=2):
    item_count = score_matrix.shape[1]
    sim_scores = np.zeros(item_count)

    user_profile = score_matrix[:, target_user]
    for idx in range(item_count):
        sim_scores[idx] = compute_pair_similarity(user_profile, score_matrix[:, idx])

    nearest_indices = np.argsort(sim_scores)[-top_k:]
    rated_sum = np.zeros(item_count)
    for idx in nearest_indices:
        rated_sum[idx] = score_matrix[:, idx].sum()

    return np.argsort(rated_sum)[::-1]


def run_demo():
    ratings = np.array([
        [5, 3, 0, 1],
        [4, 0, 4, 4],
        [1, 1, 3, 2],
        [0, 0, 4, 5],
        [2, 2, 0, 0]
    ])

    user_id = 0
    ordered_items = generate_suggestions(user_id, ratings, top_k=2)
    print(f"Item recommendation order for user {user_id}: {ordered_items}")


if __name__ == "__main__":
    run_demo()

How the Algorithm Works

The code implements a straightforward item-based collaborative filtering suggestion engine. It assumes that items liked by similar user clusters are relevant to a target user. The scoring mechanism relies on a rating matrix where rows represent items, columns represent users, and each cell holds a rating value (zero indicates no rating).

A core component is a similarity function that measures the affinity between two item vectors. Cosine similarity is applied only on positions where both items have received ratings, which helps avoid bias from missing data. If no common ratings exist, the similarity defaults to zero.

The suggestion pipeline is executed by generate_suggestions. It takes a target user identifier, the rating matrix, and a optional neighbor count top_k. The process follows these steps:

  • For every item column, compute its similarity with the target user’s rating profile using the masked cosine function.
  • Identify the k items most similar to the user’s taste.
  • Use those neighbors to estimate item scores by aggregating the total ratings each neighbor item has received.
  • Return the item indices sorted in descending order of their score, forming a ranked recommendation list.

This compact approach demonstrates the essential mechanism behind more sophisticated recommender frameworks while remaining easy to modify or extend.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.