Python 2.7中的Open（）和codecs.open（）行为异常不同

问题内容：

我有一个带有第一行unicode字符和所有其他ASCII行的文本文件。我尝试将第一行读取为一个变量，将所有其他行读取为另一个变量。但是，当我使用以下代码时：

# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()

我得到以下输出：

<open file '1.txt', mode 'rb' at 0x01235230>
28

7

And now for something completely differerent:

<open file '1.txt', mode 'r' at 0x017875A0>

28

77

如果我不使用readlines（），则整个文件都将读取，不仅是codecs.open（）和open（）的前7行。

为什么会这样呢？为什么尽管添加了’r’参数，但codecs.open（）仍以二进制模式读取文件？

更新：这是原始文件：http :
//www1.datafilehost.com/d/0792d687

问题答案：

因为您.readline() 首先使用，所以codecs.open()文件已填充了行缓冲区；随后的调用
仅.readlines()返回缓冲的行。 __

如果.readlines() 再次调用，则返回其余行：

>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71

解决方法是不要混合使用.readline()和.readlines()：

f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ')  # take the first line.

这种行为确实是一个错误。Python开发人员已意识到这一点，请参阅问题8260。

另一种选择是使用io.open()代替codecs.open();
该io库是Python 3用于实现内置open()功能的库，比该codecs模块更强大，更通用。

Python 2.7中的Open（）和codecs.open（）行为异常不同

微信关注