用开发工具打造你的同城租房排行榜爬虫

发布时间：2025-12-11 13:03:22 阅读：481 次

最近朋友在换工作，得在新城市找房。他打开几个租房平台，来回比价格、看户型，看得眼睛发花。我看着都累，干脆动手写了个小工具，自动抓取多个平台的房源数据，生成一个简单的“同城租房排行榜”。

为什么自己搞个排行榜？

市面上的租房APP各有各的算法，推荐的房源总带着商业味儿。你想看便宜的一室户，它推你长租公寓会员套餐。不如自己掌握数据，按真实价格、距离地铁站远近、发布时间排序，想怎么排就怎么排。

技术选型：Python + Scrapy + Selenium

主流的爬虫框架里，Scrapy 性能强，适合大规模抓取。但有些平台用了前端渲染，数据藏在 JavaScript 里，这时候就得请出 Selenium 模拟浏览器操作。

比如抓某个平台的房源列表，先用 Scrapy 发起请求：

import scrapy

class RentSpider(scrapy.Spider):
    name = "rent"
    start_urls = ["https://example-rent.com/beijing/rent"]

    def parse(self, response):
        for item in response.css(".house-item"):
            yield {
                'title': item.css(".title::text").get(),
                'price': item.css(".price::text").get(),
                'location': item.css(".location::text").get(),
                'link': item.css("a::attr(href)").get()
            }

遇到动态加载的页面，切换成 Selenium：

from selenium import webdriver
from selenium.webdriver.common.by import By

options = webdriver.ChromeOptions()
options.add_argument("--headless")  
driver = webdriver.Chrome(options=options)
driver.get("https://dynamic-rent-site.com/shanghai")

houses = driver.find_elements(By.CSS_SELECTOR, ".list-item")
for house in houses:
    print(house.find_element(By.CSS_SELECTOR, ".name").text)

driver.quit()

数据清洗与排序逻辑

抓下来的数据乱七八糟，有的价格写“3200元/月”，有的写“3.2k”。写个清洗函数统一格式：

import re

def clean_price(text):
    match = re.search(r"\d+", text)
    return int(match.group()) if match else 0

# 使用时
item['price_clean'] = clean_price(item['price'])

然后按需求排序，比如优先离公司近、单价低、评价高的：

ranked = sorted(houses, key=lambda x: (
    x['distance_to_office'], 
    x['price_clean'], 
    -x['rating']
))

本地化部署，定时更新

用 Cron 设置每天早上6点跑一次任务，结果存进 SQLite，再用 Flask 搭个简单页面展示：

from flask import Flask, render_template
app = Flask(__name__)

@app.route('/')
def index():
    data = get_latest_rankings()  # 从数据库读
    return render_template('index.html', houses=data)

朋友现在每天早上喝咖啡时，顺手刷一眼我的“同城租房排行榜”，哪个区性价比高，一目了然。省下的时间，多睡半小时不香吗？