Fading Coder

One Final Commit for the Last Sprint

Home > Tools > Content

Scraping NetEase Cloud Music Hot Comments to Generate Word Clouds

Tools 1

Data Collection

Building a word cloud requires raw data first. For NetEase Cloud Music, this involves several steps:

  • Packet analysis to locate the API endpoint
  • Handling encrypted request parameters
  • Extracting hot comment content

Packet Analysis

Using Chrome DevTools, the comment API endpoint becomes visible. The requests use POST method with specific parameters and encrypted headers.

Handling Encrypted Parameters

The NetEase Cloud Music API requires two encrypted fields: params and encSecKey. These values can be extracted from browser requests and reused across different song IDs. For deep technical details on the encryption mechanism, refer to the NetEase Cloud Music API analysis projects available on GitHub.

Extracitng Hot Comments

Once the endpoint is identified, the resposne returns JSON data containing comment objects. Parse the JSON and extract the content field from each hot comment.

import requests
import json

def fetch_hot_comments(song_id):
    api_url = f'http://music.163.com/weapi/v1/resource/comments/R_SO_4_{song_id}?csrf_token=test'
    
    post_data = {
        'params': '4hmFbT9ZucQPTM8ly/UA60NYH1tpyzhHOx04qzjEh3hU1597xh7pBOjRILfbjNZHqzzGby5ExblBpOdDLJxOAk4hBVy5/XNwobA+JTFPiumSmVYBRFpizkWHgCGO+OWiuaNPVlmr9m8UI7tJv0+NJoLUy0D6jd+DnIgcVJlIQDmkvfHbQr/i9Sy+SNSt6Ltq',
        'encSecKey': 'a2c2e57baee7ca16598c9d027494f40fbd228f0288d48b304feec0c52497511e191f42dfc3e9040b9bb40a9857fa3f963c6a410b8a2a24eea02e66f3133fcb8dbfcb1d9a5d7ff1680c310a32f05db83ec920e64692a7803b2b5d7f99b14abf33cfa7edc3e57b1379648d25b3e4a9cab62c1b3a68a4d015abedcd1bb7e868b676'
    }
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Referer': f'http://music.163.com/song?id={song_id}',
        'Host': 'music.163.com',
        'Origin': 'http://music.163.com'
    }
    
    response = requests.post(api_url, headers=headers, data=post_data)
    result = json.loads(response.text)
    
    comments = []
    for entry in result.get('hotComments', []):
        comments.append(entry['content'])
    
    return comments

if __name__ == '__main__':
    song_id = 439915614
    hot_comments = fetch_hot_comments(song_id)
    for comment in hot_comments:
        print(comment)

Running this script outputs the hot comments for the specified song.

Word Cloud Ganeration

The wordcloud library provides straightforward word cloud generation capabilities. Install it via pip and consult the official documentation for basic usage patterns.

Chinese text rendering requires specifying a font file that supports Chinese characters. The font_path parameter in the WordCloud constructor handles this:

from wordcloud import WordCloud
import matplotlib.pyplot as plt

from scraper import fetch_hot_comments

song_id = 439915614
text_content = " ".join(fetch_hot_comments(song_id))

cloud = WordCloud(
    random_state=1,
    font_path=r'C:/Users/Windows/fonts/simkai.ttf'
).generate(text_content)

plt.figure()
plt.imshow(cloud, interpolation='bilinear')
plt.axis('off')
plt.show()

Visual Results

The generated word cloud displays frequently occurring terms from the hot comments, providing an intuitive visualization of what resonates most with listeners.

Potential Enhancements

  • Custom masks: Generate word clouds shaped like specific images or patterns
  • Batch processing: Scrape comments from multiple songs by iterating through different song IDs
  • Service extraction: Wrap the functionality into a REST API for serving word clouds on demand

Related Articles

Efficient Usage of HTTP Client in IntelliJ IDEA

IntelliJ IDEA incorporates a versatile HTTP client tool, enabling developres to interact with RESTful services and APIs effectively with in the editor. This functionality streamlines workflows, replac...

Installing CocoaPods on macOS Catalina (10.15) Using a User-Managed Ruby

System Ruby on macOS 10.15 frequently fails to build native gems required by CocoaPods (for example, ffi), leading to errors like: ERROR: Failed to build gem native extension checking for ffi.h... no...

Resolve PhpStorm "Interpreter is not specified or invalid" on WAMP (Windows)

Symptom PhpStorm displays: "Interpreter is not specified or invalid. Press ‘Fix’ to edit your project configuration." This occurs when the IDE cannot locate a valid PHP CLI executable or when the debu...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.