Python / Regex-匹配。#,#。在字符串中
问题内容:
我可以使用什么正则表达式来匹配“。#,#”。在字符串中。字符串中可能存在也可能不存在。预期输出的一些示例可能是:
Test1.0,0.csv -> ('Test1', '0,0', 'csv') (Basic Example)
Test2.wma -> ('Test2', 'wma') (No Match)
Test3.1100,456.jpg -> ('Test3', '1100,456', 'jpg') (Basic with Large Number)
T.E.S.T.4.5,6.png -> ('T.E.S.T.4', '5,6', 'png') (Doesn't strip all periods)
Test5,7,8.sss -> ('Test5,7,8', 'sss') (No Match)
Test6.2,3,4.png -> ('Test6.2,3,4', 'png') (No Match, to many commas)
Test7.5,6.7,8.test -> ('Test7', '5,6', '7,8', 'test') (Double Match?)
最后一个不是太重要,我只希望那个。#,#。将出现一次。我正在处理的大多数文件都属于第一个到第四个示例,因此我对这些文件最感兴趣。
谢谢您的帮助!
问题答案:
要允许多个连续匹配,请使用超前/后退:
r'(?<=\.)\d+,\d+(?=\.)'
例:
>>> re.findall(r'(?<=\.)\d+,\d+(?=\.)', 'Test7.5,6.7,8.test')
['5,6', '7,8']
我们还可以根据需要使用超前执行拆分:
import re
def split_it(s):
pieces = re.split(r'\.(?=\d+,\d+\.)', s)
pieces[-1:] = pieces[-1].rsplit('.', 1) # split off extension
return pieces
测试:
>>> print split_it('Test1.0,0.csv')
['Test1', '0,0', 'csv']
>>> print split_it('Test2.wma')
['Test2', 'wma']
>>> print split_it('Test3.1100,456.jpg')
['Test3', '1100,456', 'jpg']
>>> print split_it('T.E.S.T.4.5,6.png')
['T.E.S.T.4', '5,6', 'png']
>>> print split_it('Test5,7,8.sss')
['Test5,7,8', 'sss']
>>> print split_it('Test6.2,3,4.png')
['Test6.2,3,4', 'png']
>>> print split_it('Test7.5,6.7,8.test')
['Test7', '5,6', '7,8', 'test']