使用正则表达式将客户评论中的点替换为逗号
问题内容:
我需要写一个正则表达式来替换'.'
用','
在一些患者对毒品的意见。他们在提到副作用后应该使用逗号,但是其中一些使用点。例如:
text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache. I suffered. she suffered. she told I should change it."
我编写了一个正则表达式代码来检测一个单词(例如,头痛)或两个单词(例如,噩梦),并用两个点包围:
检测被两个点包围的单词:
text= re.sub (r'(\.)(\s*\w+\s*\.)',r',\2 ', text )
检测被两个点包围的两个词:
text = re.sub (r'(\.)(\s*\w+\s\w+\s*\.)',r',\2 ', text11 )
这是输出:
the drug side-effects are: night mare, nausea, night sweat. bad dream, dizziness, severe headache. I suffered, she suffered. she told I should change it.
但是应该是:
the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache. I suffered. she suffered. she told I should change it.
我的代码并没有取代dot
后night sweat to ','
。我另外,if a sentence starts with a subject pronoun (such as I and she) I do not want to change dot to comma after it, even if it has two words (such as, I suffered)
。我不知道如何将此条件添加到我的代码中。
有什么建议吗?谢谢 !
问题答案:
您可以使用以下模式:
\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)
这匹配一个点,然后捕获一个或两个单词,其中第一个不是您提到的代词(您很可能需要扩展该列表)。其后必须是既不是单词字符也不是空格(例如.
!
:
,
)或字符串结尾的字符。
然后,您将不得不替换为 ,\1
在python中
import re
text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache. I suffered. she suffered. she told I should change it."
text = re.sub(r'\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)', r',\1', text, flags=re.I)
print(text)
产出
the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache. I suffered. she suffered. she told I should change it.
这可能不是绝对的故障保护,并且对于某些边缘情况,您可能必须扩展模式。