获取appstore上的评论

赵云龙龙

发布于 2021-01-26 22:06:54

3.1K1

发布于 2021-01-26 22:06:54

文章被收录于专栏：python爱好部落

最近有人跟我说，某个APP的得分，直线上升。大家都很高兴，还沾沾自喜。虽然第一时间，我们都怀疑是那个，大家懂的。可是信誓旦旦的说，没有。于是我决定花10分钟去探个究竟。首先我去网上搜一下，好像是有几个有源码，执行了一下，太弱了，好多乱码。不过也发现了可以用api, 省去了解析html和用正则的烦恼。

https://itunes.apple.com/rss/customerreviews/page=1/id=249015/sortby=mostrecent/json?l=en&&cc=cn

发现这个接口只要换一个id,就可以看不同app. 用json格式化一下，就可以得到很工整的json了，类似这样：

{
    "feed": {
        "author": {
            "name": {
                "label": "iTunes Store"
            },
            "uri": {
                "label": "http://www.apple.com/uk/itunes/"
            }
        },
        "entry": [{
                "author": {
                    "uri": {
                        "label": "https://itunes.apple.com/cn/reviews/id47833?l=en"
                    },
                    "name": {
                        "label": "vivitaec"
                    },
                    "label": ""
                },
                "im:version": {
                    "label": "2.1.9"
                },
                "im:rating": {
                    "label": "5"
                },
                "id": {
                    "label": "6739776859"
                },
                "title": {
                    "label": "新系统很好用"
                },
                "content": {
                    "label": "每天利用零碎时间学真的蛮方便的，APP设计也越来越贴合我们的要求，新系统挺好的",
                    "attributes": {
                        "type": "text"
                    }
                },
                "link": {
                    "attributes": {
                        "rel": "related",
                        "href": "https://itunes.apple.com/cn/review?id=1324390&l=en&type=Purple%20Software"
                    }
                },
                "im:voteSum": {
                    "label": "0"
                },

看到这样的五星好评还是很暖心的。然后可以用josonpath,来得到版本号，几颗星，评论等各种想要的信息。随便整了一下代码：

        name = jsonpath.jsonpath(i, '$.author.name.label')  # 嵌套n层也能取到所有信息,$表示最外层的{}，..表示模糊匹配
        version = jsonpath.jsonpath(i, '$.im:version.label')
        rating = jsonpath.jsonpath(i, '$.im:rating.label')
        title = jsonpath.jsonpath(i, '$.title.label')
        content = jsonpath.jsonpath(i, '$.content.label')

这样只能获得一页的，要多页的，先得把页面总数获取。

def get_page_number(url):
    result = get_content(url)
    page = [x["attributes"]["href"] for x in result["feed"]["link"] if x["attributes"]["rel"] == "last"]
    print(page[0])
    rex = re.search("page=(\d+)", page[0])
    page_number = rex.group(1)
    return int(page_number)

获取一页的，代码如下：

def get_onepage(result):
    one_page = []
    for i in result["feed"]["entry"]:
        name = jsonpath.jsonpath(i, '$.author.name.label')  # 嵌套n层也能取到所有学生姓名信息,$表示最外层的{}，..表示模糊匹配
        version = jsonpath.jsonpath(i, '$.im:version.label')
        rating = jsonpath.jsonpath(i, '$.im:rating.label')
        title = jsonpath.jsonpath(i, '$.title.label')
        content = jsonpath.jsonpath(i, '$.content.label')
        one_page.append([version[0], name[0], rating[0], title[0], content[0]])
    return one_page

用一个循环，就可以得到所有的

def get_all():
    total = get_page_number(url) + 1
    total_item = []
    for i in range(1, total):
        result = get_content(urls.format(i))
        one_page = get_onepage(result)
        total_item = total_item + one_page

    if total_item:
        df = pd.DataFrame(total_item)
        df.to_excel("C:\\work\\store.xlsx")

获取这个json也很简单：

def get_content(url):
    # 请求头和目标网址
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36'
    }

    # 获取和解析网页
    r = requests.get(url, headers=headers, verify=False)
    result = r.json()
    return result

这样跑起来，就可以得到结果了，效果是这样的。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-01-26，如有侵权请联系 cloudcommunity@tencent.com 删除

json

本文分享自 python粉丝团微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

json

登录后参与评论

0 条评论

热度

获取appstore上的评论

获取appstore上的评论

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐