开发公司疫情数据的可视化
开发公司对疫情地图的数据的抓开发公司取与可视化词云显示,开发公司采用百度地图数据https://voice.baidu.com/act/newpneumonia/newpneumonia
第一弹数据获取:
可以打印出url网页数据信息
import requestsimport jsonfrom lxml import etreeimport openpyxlurl = "https://voice.baidu.com/act/newpneumonia/newpneumonia"response = requests.get(url)print(response.text)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
查看url的网页源代码,用ctr+f 快速查找
可以看到数据文件的格式以application/json 开头
而且以component的caseList里开始才有疫情数据
通过获取URL的component对象里的caseList转换成json数据
html = etree.HTML(response.text)result = html.xpath('//script[@type="application/json"]/text()')result = result[0]result = json.loads(result)# print(result['component'][0]['globalList'])result1 = result['component'][0]['caseList']for each in result1: print(each) print('*' * 50 + '')
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
储存到excel中
# 创建工作簿wb = openpyxl.Workbook()# 创建工作表ws = wb.activews.title = "国内疫情"ws.append(['省份', '累计确诊', '死亡', '治愈', '现有确诊', '累计确诊', '死亡增量', '治愈增量', '现有确诊增量'])for each in result1: temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['relativeTime'], each['confirmedRelative'], each['diedRelative'], each['curedRelative'], each['curConfirmRelative']] for i in range(len(temp_list)): if temp_list[i] == '': temp_list[i] = '0' ws.append(temp_list)wb.save('./data.xlsx')
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
结果如下data.xls
将代码改成国外的获取数据获取globalList
result2 = result['component'][0]['globalList']for each in result2: print(each) print('*' * 50 + '')# 创建工作簿wb = openpyxl.Workbook()# 创建工作表ws = wb.activews.title = "国内疫情"ws.append(['省份', '累计确诊', '死亡', '治愈', '现有确诊 ', '累计确诊'])for each in result2: temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['confirmedRelative'], each['curConfirm']] for i in range(len(temp_list)): if temp_list[i] == '': temp_list[i] = '0' ws.append(temp_list)wb.save('./data1.xlsx')
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
将给州的数据分隔,在每个数据里有subList
比如说{'area': '欧洲', 'subList': [{'died': '52', 'confirmed': '2629', 'crued': '1535',
result2 = result['component'][0]['globalList']for each in result2: print(each) print('*' * 50 + '')# 创建工作簿wb = openpyxl.Workbook()# 创建工作表ws = wb.activews.title = "国内疫情"ws.append(['省份', '累计确诊', '死亡', '治愈', '现有确诊 ', '累计确诊'])for each in result2: temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['confirmedRelative'], each['curConfirm']] for i in range(len(temp_list)): if temp_list[i] == '': temp_list[i] = '0' ws.append(temp_list)for each in result2: sheet_title = each['area'] # 创建新的工作表 ws_out = wb.create_sheet(sheet_title) ws_out.append(['国家', '累计确诊', '死亡', '治愈', '现有确诊 ', '累计确诊']) for country in each['subList']: temp_list = [country['country'], country['confirmed'], country['died'], country['crued'], country['confirmedRelative'], country['curConfirm']] ws_out.append(temp_list)wb.save('./data1.xlsx')
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
结果如图
好了数据清洗就告一段落了。疫情的词云分析请看接下来的博文
。