使用Python Unicode的特殊字符问题

问题内容：

#!/usr/bin/env python
# -*- coding: utf_8 -*-

def splitParagraphIntoSentences(paragraph):

''' break a paragraph into sentences
    and return a list '''
    import re
# to split by multile characters

#   regular expressions are easiest (and fastest)
    sentenceEnders = re.compile('[.!?][\s]{1,2}(?=[A-Z])')
    sentenceList = sentenceEnders.split(paragraph, re.UNICODE)
    return sentenceList


if __name__ == '__main__':
p = "While other species (e.g. horse mango, M. foetida) are also grown ,Mangifera indica – the common mango or Indian mango – Sheffield’s only mango tree is valued at £9.2 billion."

sentences = splitParagraphIntoSentences(p)
for s in sentences:
    print s.strip()

预期产量：其他种类（如马芒果，鹅肝）也都在种植，曼格里达印度（普通芒果或印度芒果）是谢菲尔德唯一的芒果树，价值92亿英镑。

收成：虽然还种植了其他物种（例如马芒果，鹅肝），但芒果的单棵芒果树的价值却高达92亿卢比，其中普通芒果或印度芒果谢菲尔德谢菲尔德。

忽略句子的含义，主要要点是它不能使用特殊字符，例如“-”，“£”，“’”等。我尝试使用其他编码（例如ascii，utf-32，cp-500，iso8859_15和utf-8）设置sitecustomize.py文件和此代码，但无法解决。抱歉，我是python新手。提前感谢您的帮助。

问题答案：

找到了解决方案。

下面的代码可以正常工作。

p = p.encode('utf-8') if isinstance(p,unicode)  else p

使用Python Unicode的特殊字符问题

微信关注