使用正则表达式将客户评论中的点替换为逗号


问题内容

我需要写一个正则表达式来替换'.'','在一些患者对毒品的意见。他们在提到副作用后应该使用逗号,但是其中一些使用点。例如:

text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache.  I suffered. she suffered. she told I should change it."

我编写了一个正则表达式代码来检测一个单词(例如,头痛)或两个单词(例如,噩梦),并用两个点包围:

检测被两个点包围的单词:

text=  re.sub (r'(\.)(\s*\w+\s*\.)',r',\2 ', text )

检测被两个点包围的两个词:

text =  re.sub (r'(\.)(\s*\w+\s\w+\s*\.)',r',\2 ', text11 )

这是输出:

the drug side-effects are: night mare, nausea,  night sweat.  bad dream, dizziness,  severe headache.   I suffered, she suffered.  she told I should change it.

但是应该是:

the drug side-effects are: night mare, nausea,  night sweat,  bad dream, dizziness,  severe headache.   I suffered. she suffered.  she told I should change it.

我的代码并没有取代dotnight sweat to ','。我另外,if a sentence starts with a subject pronoun (such as I and she) I do not want to change dot to comma after it, even if it has two words (such as, I suffered)。我不知道如何将此条件添加到我的代码中。

有什么建议吗?谢谢 !


问题答案:

您可以使用以下模式:

\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)

这匹配一个点,然后捕获一个或两个单词,其中第一个不是您提到的代词(您很可能需要扩展该列表)。其后必须是既不是单词字符也不是空格(例如. ! :
,)或字符串结尾的字符。

然后,您将不得不替换为 ,\1

在python中

import re
text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache.  I suffered. she suffered. she told I should change it."
text = re.sub(r'\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)', r',\1', text, flags=re.I)
print(text)

产出

the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache.  I suffered. she suffered. she told I should change it.

这可能不是绝对的故障保护,并且对于某些边缘情况,您可能必须扩展模式。