利用淘宝IP库分析网络日志来源分布 - 行业资讯 - 肥雀云

web访问日志中含有来访IP,通过IP查看归属地,最后统计访问的区域分布,可细化到省、市

淘宝接口地址:http://ip.taobao.com/service/getIpInfo.php?ip=14.215.177.38,后面的IP按需修改

例如要查看14.215.177.38这个地址的相关信息,返回的信息如下:

,,,,

返回内容以字典形式保存,代码表示查询状态(0为成功,1为失败),具体的信息有:所属国家,区域,省份,市,所属运营商。由于用unicode编码,中文保存成\ u4e2d等形式,使用unicode转中文工具即可查看其中的内容。

要求,分析访问IP的所属省份(国外IP划分在一起),分析各个省份分布比例。日志中的IP先处理保存成次数+ IP的格式:

代码如下:

# !/usr/bin/env python 　　utf - 8编码: 　　得到__future__ import 部门　　import urllib2 　　时间=bs_url ” 　　, 　　,#定义一个全局字典,用来存放最终的统计数据,保存格式{“省份”:{“IP”:次数,…},…} 　　region_dic =, {,} 　　, 　　,#用于获取IP信息的函数,并计入以上的字典　　def get_data (IP,怀特岛=1): 　　,,,city =,”“ 　　,,,area =,”“ 　　,,,country =,”“ 　　,,,region =,”“ 　　,,,isp =,”“ 　　,,,request =, urllib2.Request (bs_url + IP) 　　,,,reponse =, urllib2.urlopen(请求) 　　,,,# print 结果　　,,,result =, eval (reponse.read ()) 　　,,,# print 结果　　,,,, 　　,,,code =,结果(“代码”) 　　,,,country_id =,结果(“数据”)(“country_id”) 　　,,# print country_id 　　,,,if code ==, 0: 　　,,,,,,,if country_id ==,“CN”: 　　,,,,,,,,,,,city =,结果(“数据”)(“城市”).decode (“unicode-escape”) 　　,,,,,,,,,,,area =,结果(“数据”)(“区域”).decode (“unicode-escape”) 　　,,,,,,,,,,,country =,结果(“数据”)(“中国”).decode (“unicode-escape”) 　　,,,,,,,,,,,region =,结果(“数据”)(“区域”).decode (“unicode-escape”) 　　,,,,,,,,,,,isp =,结果(“数据”)(“isp) .decode (“unicode-escape”) 　　,,,,,,,其他的: 　　,,,,,,,,,,,region =, u”国外” 　　,,,,,,,# print 地区　　,,,,,,,if region not 拷贝region_dic.keys (): 　　,,,,,,,,,,,region_dic [' % s ' %地区],=,{,}, 　　,,,,,,,region_dic [' % s ' %地区][' % s ' % IP],=, int(怀特) 　　,,,: 　　,,,,,,,print “request 错误” 　　,,,# print “IP: % s \ nCity: % s \ nArea: % s \ nCountry: % s \ nRegion: % s \ nISP: % s“% (IP,城市,地区,国家,地区,isp) 　　,,,, 　　if __name__ ==,“__main__”: 　　,,,count =1 　　,,,ip_list =, [] 　　,,,fo =,开放(‘ips.txt’,‘r’) 　　,,,#,要分析的IP保存在文件中　　,,,for line 拷贝fo.xreadlines (): 　　,,,,,,,怀特岛,ip =, line.strip () .split () 　　,,,,,,,get_data (ip,怀特) 　　,,,,,,,count +=, int(怀特) 　　,,,fo.close () 　　,,,, 　　,,,print u”合计:“ 　　,,,for 区域,stats 拷贝region_dic.items (): 　　,,,,,,,times =0 　　,,,,,,,for time 拷贝stats.values (): 　　,,,,,,,,,,,times +=, 　　,,,,,,,print “% s: % .2f % % % (regions.encode (utf - 8), int(次)/数)

运行结果:

利用淘宝IP库分析网络日志来源分布”> 注:其他可用的IP库接口: ,,,,,,,,,新浪接口,http://int.dpool.sina.com.cn/iplookup/iplookup.php?format=js& ip=14.215.177.38 <h2 class= 利用淘宝IP库分析网络日志来源分布