I've been looking around but have yet to find a solution. I'm trying to scrape an HTML document and get the text between two comments however have been unable to do this successfully so far.
I'm using PHP and have tried the PHP Simple DOM parser recommended here many times but can't seem to get it to do what I want.
Here's (part of) the page that I wish to parse:
<div> <!-- blah --> text <!-- end blah --> Text I want <!-- blah --> text <!-- end blah --></div>Thanks
- Could you show us your current code?Randell– Randell2009-08-26 06:01:39 +00:00CommentedAug 26, 2009 at 6:01
2 Answers2
Assuming that each comment is different (i.e. "blah" is not the same in the first and second sections), you can use some simplestrpos to grab everything between them.Regular expressions are not necessary.
$startStr = '<!-- end blah1 -->';$endStr = '<!-- start blah2 -->';$startPos = strpos($HTML, $startStr) + strlen($startStr);$endPos = strpos($HTML, $endStr );$textYouWant = substr($HTML, $startPos, $endPos-$startPos);If the two sets of commentsare the same, you'll need to modify this to find the second "blah", usingstrpos'soffset parameter
Comments
Maybe you can use regular expressions?
$text = '<div> <!-- blah --> text <!-- end blah --> Text I want <!-- blah --> text <!-- end blah --></div>';$regex = '/(<!-- end blah -->)(.*?)(<!-- blah -->)/ims';$match = preg_match_all ($regex, $text, $matches);2 Comments
Explore related questions
See similar questions with these tags.
