天某查数据接口

天某查反爬虫方式

用户鉴权 IP限制 登录验证码 风控行为识别 核心数据是否要求VIP
登录才能访问 50次/IP 极验验证码 跳转到这个页面 过简单的行为验证码 风险 财务 等数据要求

新闻数据库为非核心数据库,可以实现免费注册账号之后,维护一个Cookie 池只做验证Cookie有效性 登录天眼查保存Cookie的功能.

Cookie 说明

1 浏览器控制台

1
document.cookie

2 执行脚本

1
function getCookie(){ var cookie=document.cookie;return cookie};getCookie()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import ast
import re
import requests

cookie_str = 'jsid=SEO-GOOGLE-ALL-SY-000001; TYCID=d0744580d32e11ecbe3dc57af635e3b5; _bl_uid=kClIm3eg51L9ntdFR83tdgvvzR99; ssuid=2090654984; _ga=GA1.2.1966907517.1652495881; csrfToken=_lgILD0y0X2VvYCH9WYiZ-Hd; bannerFlag=true; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%22219960049%22%2C%22first_id%22%3A%22180c06dc366a4a-06f8cd088f5a2d-34726702-2073600-180c06dc36710b1%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%7D%2C%22%24device_id%22%3A%22180c06dc366a4a-06f8cd088f5a2d-34726702-2073600-180c06dc36710b1%22%2C%22identities%22%3A%22eyIkaWRlbnRpdHlfbG9naW5faWQiOiIyMTk5NjAwNDkiLCIkaWRlbnRpdHlfY29va2llX2lkIjoiMTgwZjNiODFjMDUzNzAtMGY2ZTVhOTljZjhhMDItMWU1MzU2MzMtMjA3MzYwMC0xODBmM2I4MWMwNjEwMzIifQ%3D%3D%22%2C%22history_login_id%22%3A%7B%22name%22%3A%22%24identity_login_id%22%2C%22value%22%3A%22219960049%22%7D%7D; tyc-user-info=%7B%22state%22%3A%220%22%2C%22vipManager%22%3A%220%22%2C%22mobile%22%3A%2218827603962%22%7D; tyc-user-info-save-time=1653356441552; auth_token=eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiIxODgyNzYwMzk2MiIsImlhdCI6MTY1MzM1NjQ0MSwiZXhwIjoxNjU1OTQ4NDQxfQ._mF-UJeQsLBiSbKQnneoOw-yjGQ0qkYuDatYz-gB6-oNaEGXKkU4pcrU8Uvr_EdQzKKv5uS8slHip0jI45SCFw; Hm_lvt_e92c8d65d92d534b0fc290df538b4758=1652495864,1653380493; _gid=GA1.2.1692265788.1653380495; cloud_token=db89f11129774239b2c8b18b49170bbb; RTYCID=388134f9c83749119ab300c6c7f2a879; Hm_lpvt_e92c8d65d92d534b0fc290df538b4758=1653381048'




def tran_cookies_str_to_dict(cookie_str):
first = re.sub("(.*?)=(.*?); ", '"\\1":"\\2",\n', cookie_str)
sec = re.sub(",\n([^\"]*?)=(.*)", ',\n"\\1":"\\2"', first)
sec = "{" + sec + "}"
return ast.literal_eval(sec)


if __name__ == '__main__':
print(tran_cookies_str_to_dict(cookie_str))

天某查新闻接口说明

请求天眼查新闻页面的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

import requests



cookies = {
'auth_token': 'eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiIxODgyNzYwMzk2MiIsImlhdCI6MTY1MzM1NjQ0MSwiZXhwIjoxNjU1OTQ4NDQxfQ._mF-UJeQsLBiSbKQnneoOw-yjGQ0qkYuDatYz-gB6-oNaEGXKkU4pcrU8Uvr_EdQzKKv5uS8slHip0jI45SCFw',
}

headers = {
'authority': 'www.tianyancha.com',
'accept': '*/*',
'accept-language': 'zh,zh-CN;q=0.9',
'dnt': '1',
'referer': 'https://www.tianyancha.com/company/11684584',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
}

params = {
'TABLE_DIM_NAME': 'findNewsCount',
'ps': '10',
'pn': '10',
'id': '11684584',
'name': '中航重机股份有限公司',
'companyBizType': '8',
'_': '1653380498166',
}

response = requests.get('https://www.tianyancha.com/pagination/findNewsCount.xhtml', params=params, cookies=cookies,
headers=headers)
print(response.text)

1
2
3
4
5
6
7
8
9
10
11
12
cookies 是从浏览器获取到的cookie转换过来的,cookie中除了auth_token 之外其余都是可选的,auth_token的一般是由服务端来签发的,这个不知道具体算法是没有办法生成的,只能通过其他方式登录天眼查之后保存这个字段使用


params.TABLE_DIM_NAME: 数据表名称 relatedAnnouncement为公告研报 findNewsCount 新闻舆情
params.id 天眼查的数据库中该公司的名称
params.name 查询公司的名字
params.ps pagesize 页面大小
params.pn pagenumber 页码

params.name 和 params.id 是一一对应的
params.companyBizType [可选]未知参数
params._ [可选]时间戳

天某查网页接口返回数据说明

返回的数据不是json格式的,是渲染好的前端网页代码的一部分,借助三方库能够实现返回数据的解析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
from parsel import Selector
import requests



cookies = {
'auth_token': 'eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiIxODgyNzYwMzk2MiIsImlhdCI6MTY1MzM1NjQ0MSwiZXhwIjoxNjU1OTQ4NDQxfQ._mF-UJeQsLBiSbKQnneoOw-yjGQ0qkYuDatYz-gB6-oNaEGXKkU4pcrU8Uvr_EdQzKKv5uS8slHip0jI45SCFw',
}

headers = {
'authority': 'www.tianyancha.com',
'accept': '*/*',
'accept-language': 'zh,zh-CN;q=0.9',
'dnt': '1',
'referer': 'https://www.tianyancha.com/company/11684584',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
}

params = {
'TABLE_DIM_NAME': 'findNewsCount',
'ps': '10',
'pn': '10',
'id': '11684584',
'name': '中航重机股份有限公司',
'companyBizType': '8',
'_': '1653380498166',
}

response = requests.get('https://www.tianyancha.com/pagination/findNewsCount.xhtml', params=params, cookies=cookies,
headers=headers)

selector = Selector(text=response.text)

news_contents = selector.xpath('//div[@class="company-news-content"]')
for content in news_contents:
link = content.xpath('.//div[1]/a/@href').extract()
title = content.xpath('.//div[1]/a/text()').extract()
tags = content.xpath('./div[@class="news-tags"]//span/text()').extract()
abstract_news = content.xpath('./div[@class="abstracts -new"]//text()').extract()
abstract = content.xpath('./div[@class="abstracts "]//text()').extract()
source = content.xpath('./div[@class="infos"]/span[1]/text()').extract()
time_ = content.xpath('./div[@class="infos"]/span[2]/text()').extract()
company = content.xpath('./div[@class="infos"]/span[3]/a//text()').extract()
if not abstract_news:
abstract_news = abstract
print(company)



天某查数据接口
https://kingjem.github.io/2022/05/25/天某查数据接口/
作者
Ruhai
发布于
2022年5月25日
许可协议