我注意到str_getcsv
似乎没有用双引号括住它接收到的第一个值,即使字符串数据是这样传递的。
在下面的示例中,第3行的第一个值是"Small Box,但较小"
,但在通过str_getcsv
运行后,它变成了Small Box,但较小
(没有双引号)。像这样:
// multi-line csv string
$csvString = <<<'CSV'
"Title","Description",Quantity
"Small Box","For storing magic beans.",2
"Small Box, But Smaller","Not sure why we need this.",0
CSV;
// split string into rows (don't use explode in case multi-line values exist)
$csvRows = str_getcsv($csvString, "\n"); // parse rows
echo '<pre>';
print_r($csvRows);
echo '</pre>';
输出:
Array
(
[0] => Title,"Description",Quantity
[1] => Small Box,"For storing magic beans.",2
[2] => Small Box, But Smaller,"Not sure why we need this.",0
)
这导致的问题是,现在如果使用str_getcsv
解析每一行,第一个值中的逗号会将其分成两行。如果它继续运行:
foreach($csvRows as &$csvRow) {
$csvRow = str_getcsv($csvRow); // parse each row into values and save over original array value
}
unset($csvRow); // clean up
// output
echo '<pre>';
print_r($csvRows);
echo '</pre>';
输出:
Array
(
[0] => Array
(
[0] => Title
[1] => Description
[2] => Quantity
)
[1] => Array
(
[0] => Small Box
[1] => For storing magic beans.
[2] => 2
)
[2] => Array
(
[0] => Small Box
[1] => But Smaller
[2] => Not sure why we need this.
[3] => 0
)
)
问题出在最后一个数组值上,它是一个由4个键而不是3个键组成的数组。它被分割在值“小盒子,但更小”
的逗号上。
另一方面,只解析一行字符串可以:
$csvRowData = '"Small Box, But Smaller","Not sure why we need this.",0';
$csvValues = str_getcsv($csvRowData);
echo '<pre>';
print_r($csvValues);
echo '</pre>';
输出:
Array
(
[0] => Small Box, But Smaller
[1] => Not sure why we need this.
[2] => 0
)
为什么会发生这种情况,我如何解决多行CSV数据的问题?当多行CSV数据是字符串并且不是直接从文件中读取时,是否有处理多行数据的最佳实践?此外,我需要处理多行值,例如"foo\n bar"
,因此我不能只使用爆炸()
而不是第一个str_getcsv()
。
经过多次头痛,我想我现在明白了这个问题。根据PHP,“str_getcsv()旨在将单个CSV记录解析为字段”(参见https://bugs.php.net/bug.php?id=55763)。我发现对多行使用str_getcsv()
会导致这些记录不太好的问题:
我通过创建一个临时文件并将CSV内容写入其中来解决这个问题。然后我使用fgetcsv()
读取文件,这并没有导致我上面描述的2个问题。示例代码:
// multi-line csv string
$csvString = <<<'CSV'
"Title","Description",Quantity
"Small Box","For storing magic beans.",2
"Small Box, But Smaller","This value
contains
multiple
lines.",0
CSV;
// ^ notice the multiple lines in the last row's value
// create a temporary file
$tempFile = tmpfile();
// write the CSV to the file
fwrite($tempFile, $csvString);
// go to first character
fseek($tempFile, 0);
// track CSV rows
$csvRows = array();
// read the CSV temp file line by line
while (($csvColumns = fgetcsv($tempFile)) !== false) {
$csvRows[] = $csvColumns; // push columns to array (really it would be more memory-efficient to process the data here and not append to an array)
}
// Close and delete the temp file
fclose($tempFile);
// output
echo '<pre>';
print_r($csvRows);
echo '</pre>';
结果在:
Array
(
[0] => Array
(
[0] => Title
[1] => Description
[2] => Quantity
)
[1] => Array
(
[0] => Small Box
[1] => For storing magic beans.
[2] => 2
)
[2] => Array
(
[0] => Small Box, But Smaller
[1] => This value
contains
multiple
lines.
[2] => 0
)
)
我还要补充一点,我在GitHub上找到了一些选项,以及PHP5.4和5.5PHP两个主要项目。然而,我仍然在使用5.3PHP,只看到活动有限的选项。此外,一些处理过的CSV字符串写入文件并读出它们。
我还应该注意到,PHP留档有一些关于str_getcsv()
不符合RFC的注释:http://php.net/manual/en/function.str-getcsv.php.fgetcsv()
似乎也是如此,但后者确实满足了我的需求,至少在这种情况下是这样。
我不知道为什么你PHP_EOL在我的服务器上不能正常工作,但是我以前确实遇到过这个问题。
我采取的方法如下。
首先,我想确保我的所有字段都被双引号包围,无论字段中的值如何,以便使用您的示例文本(稍作修改):
// multi-line csv string
$csvString = <<<CSV
"Title","Description","Quantity"
"Small Box","For storing magic beans.","2"
"Small Box, But Smaller","Not sure why we need this.","0"
"a","\n","b","c"
CSV;
$csvString .= '"a","' . "\n" . '","' . PHP_EOL . '","c"';
其次,我针对可能在值中徘徊的单独PHP_EOL,因此我可以用"\r\n"替换任何"PHP_EOL"字符串
// Clear any solo end of line characters that are within values
$csvString = str_replace('","' . PHP_EOL . '"', '",""',$csvString);
$csvString = str_replace('"' . PHP_EOL . '","', '"","',$csvString);
$csvString = str_replace('"' . PHP_EOL . '"', '"'. "\r\n" . '"',$csvString);
最后,这允许我使用php爆炸函数并显示输出:
$csvArr = explode("\r\n",$csvString);
foreach($csvArr as &$csvRow) {
$csvRow = str_getcsv($csvRow); // parse each row into values and save over original array value
}
unset($csvRow); // clean up
// output
echo '<pre>';
print_r($csvArr);
echo '</pre>';
其输出:
Array
(
[0] => Array
(
[0] => Title
[1] => Description
[2] => Quantity
)
[1] => Array
(
[0] => Small Box
[1] => For storing magic beans.
[2] => 2
)
[2] => Array
(
[0] => Small Box, But Smaller
[1] => Not sure why we need this.
[2] => 0
)
[3] => Array
(
[0] => a
[1] =>
[2] => b
[3] => c
)
[4] => Array
(
[0] => a
[1] =>
[2] =>
[3] => c
)
)
正如您从输出中看到的,新行字符不是针对的,只是PHP_EOL。