TFRecord文件查看包含的所有代特征码 - 行业资讯 - 肥雀云

TFRecord作为tensorflow中广泛使用的数据格式,它跨平台,省空间,效率高。因为tensorflow开发者众多,统一训练时数据的文件格式是一件很有意义的事情,也有助于降低学习成本和迁移成本。

但是TFRecord数据是二进制格式,没法直接查看。因此,如何能够方便的查看TFRecord格式和数据,就显得尤为重要了。

为什么需要查看TFReocrd数据?首先我们先看下常规的写入和读取TFRecord数据的关键过程。

　　　　　　# 1。写入过程　　#一张图片,我写入了其内容,标签,长和宽几个信息　　tf_example=tf.train.Example ( 　　特点=tf.train.Features(特性={ 　　“编码”:bytes_feature (encoded_jpg), 　　“标签”:int64_feature(标签), 　　“高度”:int64_feature(高度), 　　“宽度”:int64_feature(宽度)})) 　　# 2。读取过程　　#定义解析的TFRecord数据格式　　def _parse_image (example_proto): 　　特点={“编码”:tf.FixedLenFeature ((), tf.string), 　　“标签”:tf.FixedLenFeature ((), tf.int64), 　　“高度”:tf.FixedLenFeature ((), tf.int64), 　　“宽度”:tf.FixedLenFeature (()、tf.int64) 　　} 　　返回特遣部队。parse_single_example (example_proto特性) 　　　　# TFRecord数据按照功能解析出对应的真实数据　　ds=ds。地图(λx: _parse_image (x) num_parallel_calls=4) 　　之前　　　　

上面是一个标准的TFRecord数据的写入和读取部分过程,大家应该发现了,读取TFRecord数据的时候,得知道TFRecord数据保存的属性名和类型,任何一项不匹配,都会导致无法获取数据。

如果数据的写入和读取都是自己一个人完成,那就没问题。但是如果写入和读取是跨团队合作时候,如果每次读取数据都得让对方给完整的属性名和属性类型,那效率就太低了。毕竟TFRecord数据已经包含了一切,自己动手丰衣足食。

那么怎么查看TFRecord数据呢?使用python tf.train.Example.FromString (serialized_example)方法,方法的入参是TFRecord包含的数据字符串。

然后,我直接将上诉查看的过程写成了一个py脚本,需要自取。

　　　　　　# !/usr/bin/python 　　# - * -编码:utf - 8 - * 　　　　导入系统　　进口tensorflow特遣部队　　　　#用法:python trackTFRecord。py真正file1 file2 　　# trackTFRecord。py就是当前这个py文件　　#真表示是否输出具体的数据　　# file1 file2表示的是需要查看的TFRecord文件的绝对路径　　#输出说明:tf.float32对应TFRecord的FloatList, tf.int64对应Int64List, tf.string对应BytesList 　　def main (): 　　打印(“TFRecord文件个数为{0}个“.format (len (sys.argv) 2)) 　　因为我在范围(2,len (sys.argv)): 　　filepath=sys.argv[我] 　　与tf.Session税(): 　　文件名=[filepath] 　　#加载TFRecord数据　　ds=tf.data.TFRecordDataset(文件名) 　　ds=ds.batch (10) 　　ds=ds.prefetch (buffer_size=tf.contrib.data.AUTOTUNE) 　　迭代器=ds.make_one_shot_iterator () 　　#为了加快速度,仅仅简单拿一组数据看下结构　　batch_data=https://www.yisu.com/zixun/iterator.get_next () 　　res=sess.run (batch_data) 　　serialized_example=res [0] 　　example_proto=tf.train.Example.FromString (serialized_example) 　　特点=example_proto.features 　　打印('{0}信息如下:“.format (filepath)) 　　在features.feature关键: 　　特点=features.feature(例子) 　　ftype=没有　　fvalue=https://www.yisu.com/zixun/None 　　如果len (feature.bytes_list.value)> 0: 　　ftype=' bytes_list ' 　　fvalue=https://www.yisu.com/zixun/feature.bytes_list.value 　　　　如果len (feature.float_list.value)> 0: 　　ftype=' float_list ' 　　fvalue=https://www.yisu.com/zixun/feature.float_list.value 　　　　如果len (feature.int64_list.value)> 0: 　　ftype=' int64_list ' 　　fvalue=https://www.yisu.com/zixun/feature.int64_list.value 　　　　结果={0}:{1}。格式(关键,ftype) 　　如果“真正的”==sys.argv [1]: 　　结果={0}:{1}。格式(因此,fvalue) 　　打印(结果) 　　　　if __name__==癬_main__”: 　　main () 　　　　

下面给大家实例演示,首先先随便找个图片,写入到TFRecord数据

　　　　　　进口tensorflow特遣部队　　　　文件名="/用户/zhanhaitao/桌面/1. png” 　　#使用tf.read_file读进图片数据　　形象=tf.read_file(文件名) 　　#主要是为了获取图片的宽高　　image_jpeg=tf.image.decode_jpeg(形象、渠道=3,name=" decode_jpeg_picture ") 　　#重塑图片到原始大x2000x3小2500 　　image_jpeg=特遣部队。重塑(image_jpeg形状=(2500、2000、3)) 　　#获取图片形状数据　　img_shape=image_jpeg.shape 　　宽度=img_shape [0] 　　身高=img_shape [1] 　　#将原图片张量生成字节对象,形象将保存到tfrecord 　　税=tf.Session () 　　形象=sess.run(图片) 　　sess.close () 　　#定义TFRecords文件的保存路径及其文件名　　path_none="/用户/zhanhaitao/桌面/a.tfrecord” 　　#定义不同压缩选项的TFRecordWriter 　　writer_none=tf.python_io。TFRecordWriter (path_none选项=没有) 　　#将外层特性生成特定格式的例子　　example_none=tf.train.Example(功能=tf.train.Features(功能={ 　　“float_val”: tf.train.Feature (float_list=tf.train.FloatList (value=https://www.yisu.com/zixun/[9.99])),“宽度”:tf.train.Feature (int64_list=tf.train.Int64List(值=https://www.yisu.com/zixun/(宽度))),“高度”:tf.train.Feature (int64_list=tf.train.Int64List(值=https://www.yisu.com/zixun/(高度))),“image_raw”: tf.train.Feature (bytes_list=tf.train.BytesList (value=https://www.yisu.com/zixun/[图片])) 　　})) 　　#例子系列化字符串　　example_str_none=example_none.SerializeToString () 　　#将系列化字符串写入协议缓冲区　　writer_none.write (example_str_none) 　　　　#关闭TFRecords文件操作接口　　writer_none.close () 　　　　打印(“finish tfrecord文件写入数据!”)