Building a Basic Collaborative Filtering Recommender with Python
import numpy as np
def compute_pair_similarity(vec_a, vec_b):
shared_mask = (vec_a > 0) & (vec_b > 0)
if shared_mask.sum() == 0:
return 0.0
a = vec_a * shared_mask
b = vec_b * shared_mask
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def generate_suggestions(target_user, score_matrix, top_k=2):
item_count = score_matrix.shape[1]
sim_scores = np.zeros(item_count)
user_profile = score_matrix[:, target_user]
for idx in range(item_count):
sim_scores[idx] = compute_pair_similarity(user_profile, score_matrix[:, idx])
nearest_indices = np.argsort(sim_scores)[-top_k:]
rated_sum = np.zeros(item_count)
for idx in nearest_indices:
rated_sum[idx] = score_matrix[:, idx].sum()
return np.argsort(rated_sum)[::-1]
def run_demo():
ratings = np.array([
[5, 3, 0, 1],
[4, 0, 4, 4],
[1, 1, 3, 2],
[0, 0, 4, 5],
[2, 2, 0, 0]
])
user_id = 0
ordered_items = generate_suggestions(user_id, ratings, top_k=2)
print(f"Item recommendation order for user {user_id}: {ordered_items}")
if __name__ == "__main__":
run_demo()
How the Algorithm Works
The code implements a straightforward item-based collaborative filtering suggestion engine. It assumes that items liked by similar user clusters are relevant to a target user. The scoring mechanism relies on a rating matrix where rows represent items, columns represent users, and each cell holds a rating value (zero indicates no rating).
A core component is a similarity function that measures the affinity between two item vectors. Cosine similarity is applied only on positions where both items have received ratings, which helps avoid bias from missing data. If no common ratings exist, the similarity defaults to zero.
The suggestion pipeline is executed by generate_suggestions. It takes a target user identifier, the rating matrix, and a optional neighbor count top_k. The process follows these steps:
- For every item column, compute its similarity with the target user’s rating profile using the masked cosine function.
- Identify the
kitems most similar to the user’s taste. - Use those neighbors to estimate item scores by aggregating the total ratings each neighbor item has received.
- Return the item indices sorted in descending order of their score, forming a ranked recommendation list.
This compact approach demonstrates the essential mechanism behind more sophisticated recommender frameworks while remaining easy to modify or extend.