解析一个numpy数组的字符串表示形式
问题内容:
如果我只有一个字符串表示形式numpy.array
:
>>> import numpy as np
>>> arr = np.random.randint(0, 10, (10, 10))
>>> print(arr) # this one!
[[9 4 7 3]
[1 6 4 2]
[6 7 6 0]
[0 5 6 7]]
如何将其转换回numpy数组?实际,
手动插入并不复杂,但我正在寻找一种编程方式。
一个简单的用正则表达式替换空格的方法,
实际上适用于一位整数:
>>> import re
>>> sub = re.sub('\s+', ',', """[[8 6 2 4 0 2]
... [3 5 8 4 5 6]
... [4 6 3 3 0 3]]
... """)
>>> sub
'[[8,6,2,4,0,2],[3,5,8,4,5,6],[4,6,3,3,0,3]],' # the trailing "," is a bit annoying
可以将其转换为几乎相同的数组(dtype可能会丢失,但是可以)。
>>> import ast
>>> np.array(ast.literal_eval(sub)[0])
array([[8, 6, 2, 4, 0, 2],
[3, 5, 8, 4, 5, 6],
[4, 6, 3, 3, 0, 3]])
但是对于多位数的整数和浮点数则失败:
>>> re.sub('\s+', ',', """[[ 0. 1. 6. 9. 1. 4.]
... [ 4. 8. 2. 3. 6. 1.]]
... """)
'[[,0.,1.,6.,9.,1.,4.],[,4.,8.,2.,3.,6.,1.]],'
因为这些,
在开始时会有额外的内容。
的溶液并不一定需要是基于正则表达式,任何其它的方法对作品 unabriged (不与缩短的...
)布尔/ INT /浮动/
1-4个尺寸复杂阵列将是确定。
问题答案:
这是一个非常手动的解决方案:
import re
import numpy
def parse_array_str(array_string):
tokens = re.findall(r''' # Find all...
\[ | # opening brackets,
\] | # closing brackets, or
[^\[\]\s]+ # sequences of other non-whitespace characters''',
array_string,
flags = re.VERBOSE)
tokens = iter(tokens)
# Chomp first [, handle case where it's not a [
first_token = next(tokens)
if first_token != '[':
# Input must represent a scalar
if next(tokens, None) is not None:
raise ValueError("Can't parse input.")
return float(first_token) # or int(token), but not bool(token) for bools
list_form = []
stack = [list_form]
for token in tokens:
if token == '[':
# enter a new list
stack.append([])
stack[-2].append(stack[-1])
elif token == ']':
# close a list
stack.pop()
else:
stack[-1].append(float(token)) # or int(token), but not bool(token) for bools
if stack:
raise ValueError("Can't parse input - it might be missing text at the end.")
return numpy.array(list_form)
或者,根据检测插入逗号的位置,使用较少的手动解决方案:
import re
import numpy
pattern = r'''# Match (mandatory) whitespace between...
(?<=\]) # ] and
\s+
(?= \[) # [, or
|
(?<=[^\[\]\s])
\s+
(?= [^\[\]\s]) # two non-bracket non-whitespace characters
'''
# Replace such whitespace with a comma
fixed_string = re.sub(pattern, ',', array_string, flags=re.VERBOSE)
output_array = numpy.array(ast.literal_eval(fixed_string))