PHP Simple HTML DOM Parser


Problem with finding

Q: Element not found in such case:
$html->find('div[style=padding: 0px 2px;] span[class=rf]');

A: If there is blank in selectors, quote it!  
$html->find('div[style="padding: 0px 2px;"] span[class=rf]');

Problem with hosting

Q: On my local server everything works fine, but when I put it on my esternal server it doesn't work. 

A: The "file_get_dom" function is a wrapper of "file_get_contents" function,  you must set "allow_url_fopen" as TRUE in "php.ini" to allow accessing files via HTTP or FTP. However, some hosting venders disabled PHP's "allow_url_fopen" flag for security issues... PHP provides excellent support for "curl" library to do the same job, Use curl to get the page, then call "str_get_dom" to create DOM object. 

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_URL, 'http://????????');  
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);  
$str = curl_exec($curl);  
$html= str_get_html($str); 

Behind a proxy

Q: My server is behind a Proxy and i can't use file_get_contents b/c it returns a unauthorized error.

A: Thanks for Shaggy to provide the solution: 
// Define a context for HTTP. 
$context = array

       'http' => array
              'proxy' => 'addresseproxy:portproxy', // This needs to be the server and the port of the NTLM Authentication Proxy Server. 
              'request_fulluri' => true, 

$context = stream_context_create($context); 
$html= file_get_html('', false, $context); 

Memory leak!

Q: This script is leaking memory seriously... After it finished running, it's not cleaning up dom object properly from memory.. 

A: Due to php5 circular references memory leak, after creating DOM object, you must call $dom->clear() to free memory if call file_get_dom() more then once. 


$html = file_get_html(...); 
// do something... 

Author: S.C. Chen (
Original idea is from Jose Solorzano's HTML Parser for PHP 4.
Contributions by: Yousuke Kumakura, Vadim Voituk, Antcs