Parsing XML using PHP4 June 6, 2005
Posted by Burhan in : PHP, Tutorials, XML , trackbackThis tutorial will explain how to parse (that is, read and interpret) information from a XML file using PHP. I will discuss the very basics of XML (mainly structure), and then jump right in to the reading and parsing of XML files. This is not a tutorial on XML itself, just parsing XML with PHP.
First, the pre-requisites. Stuff you will need to ensure you have running.
- Web server with PHP installed. No special extensions are needed for this tutorial.
- Ability to save files on the webserver (either upload, or save directly via FTP)
- A decent text editor. Preferrably one that has syntax highlighting support for PHP
XML File Structure
eXtensible Markup Language or XML as its commonly called is primarily used to facilitate the interchage of information between environments that are not compatible natively (that is, they don’t support each other’s default file format). An example here would be a database server that doesn’t import Access files and has its own propriety data format.
The key word in XML is extensible. This means that the structure of the file is left entirely up to the creator. There are simply a few rules that you must follow to create XML (one such rule being that there can be only one root element). Other than that, the end user has free reign as to the tags that he may use, attributes, etc. One more thing about XML that we must understand is that there is no tag set for XML. Like HTML, which has a set of tags (<p>, etc.), XML has no pre-set tags. It is up to the end user to define tags for a file. In XML, a tag can be almost anything :
[xml]
[/xml]
A few more rules about XML and we will be on our way to our PHP code. XML documents must be well-formed. This means that there can be only one root element (the top most element), all child elements must be nested properly <p>foo<b>bar</b></p> not <p>foo<b>bar</p></b>, and all elements must have end tags.
In XML, an element is also referred to as a node.
Once these simple rules are understood, we can start creating a XML document that a XML parser will understand.
Creating our XML File
Almost all the XML examples that I’ve seen use an address book example, but just to be different (and to make things interesting), we are going to create a XML file that contains information about the images in a folder and we will use this to create a very simple gallery.
The first step is to decide what information we want to store about the image. There are the usual suspects (the file name, the name of the image, the size) and we want to make sure this information is in our XML file. As I mentioned above, all XML documents contain a root tag. This will be the tag that will start and end our XML file. Lets call this tag <imageinfo>
The next step is to decide on what attributes a tag will have. An attribute is a property of the tag. For example, the <img> tag in HTML has the attribute src which tells user agent (browser) where to find the image.
For our example, we will create a size tag and give it the attributes width and height.
Most of the time developers are more concerned with parsing XML files rather than writing them, however, knowing the basics of what to expect in a XML file helps when trying to debug a parser.
Let write our very basic XML file :
[xml]< ?xml version="1.0"?>
< ?xml version=”1.0″?> must be the first line in a XML file. It is called the xml declaration and identifies the file as a XML file to a parser.
Parsing with PHP
The PHP engine comes with built-in functions to enable XML parsing using the expat library written by James Clark.
These functions allow us to create our own XML parser. The XML functions we will use (linked to their php manual references) are :
- xml_parser_create()
- xml_set_element_handler()
- xml_set_character_data_handler()
- xml_parse()
- xml_parser_free()
- xml_error_string()
- xml_get_error_code()
- xml_get_current_line_number()
Other (non xml) PHP function that we will use are :
Creating our Parser
The first step is to create and setup our parser. The xml_parser_create() function will create the parser for us, and return us a handle to that parser. We will then have to setup the different handlers so that the parser knows what to do with each type of information (be it an opening tag, a closing tag, stuff between the tag, etc). Lets first check to make sure that we can create our parser, which is perhaps the least confusing line of code :
[php]if (! ($xmlparser = xml_parser_create()) )
{
die (”Cannot create parser”);
}[/php]
This code simply checks to see if we can create a parser. It will quit with an appropriate message, since if we can’t create the parser, there is no use in going any further. If your script quits here with the error message, then your PHP installation isn’t setup with the expat library. Most Unix/Linux based servers have the expat library as part of their PHP install. Check the xml reference section of the PHP manual for instructions on installing the expat library. Alternately, you can also send a support request to your host/ISP’s help desk.
Once we have created our parser, it is time to configure it to handle our XML file. The xml_set_element_handler() function takes three arguments. The first one is a handle to our xml_parser (which is $xmlparser). The next argument is the name of a function that the parser will call when it finds an open tag, and the last argument is the name of a function that the parser will call when it reaches an ending tag. We are going to write the functions that will be called for each open and close tag. Sounds scary, but it really is very straightforward.
Setting up tag handlers
First, lets write out function that will be called for an open tag. The name of the function can be any valid PHP function name. The function must accept three arguments, and they must be $parser, $name, $attribs.
$parser = handle to our parser
$name = name of the current tag
$attrib = an array containing any attributes of the current tag
We don’t have to worry about calling the function, the parser does that automatically as it goes through our XML file. With that in mind, lets write our start tag function, which we will call start_tag (how creative, I know).
[php]function start_tag($parser, $name, $attribs) {
echo “Current tag : “.$name.”
“;
if (is_array($attribs)) {
echo “Attributes :
“;
while(list($key,$val) = each($attribs)) {
echo “Attribute “.$key.” has value “.$val.”
“;
}
}
}[/php]
Next, we will write our function that will be called when an ending tag is reached. This function, like our opening tag function, can be of any name that’s valid in PHP. The ending tag function must take these parameters $parser, $name.
$parser = handle to our parser
$name = name of the current tag
Lets write our ending tag function (which we will call end_tag):
[php]function end_tag($parser, $name) {
echo “Reached ending tag “.$name.”
“;
}[/php]
We have now taken care of all the requirements for the xml_set_element_handler function, and now we can call it :
[php]
xml_set_element_handler($xmlparser, “start_tag”, “end_tag”);
[/php]
Setting up content (data) handlers
We have taken care of our starting and ending tags, so now we must deal with the acutal content of a tag. The xml_set_character_data_handler function sets up the character data handling functions for the parser. Since we know that our data is going to be character based, we will use this function. There are different xml_set functions for different types of data. You can view the list of different data handler functions in the php manual.
The xml_set_character_data_handler function takes two arguments. One is a handle to the parser, and the other is the name of the function to call for character data. Like the opening and closing tag functions, we have to write the character data handling function. Our function must accept these two arguments $parser, $data :
[php]function tag_contents($parser, $data) {
echo “Contents : “.$data.”
“;
}[/php]
Once the function is written, we can setup the parser to use it :
[php]xml_set_character_data_handler($xmlparser, “tag_contents”);[/php]
Our functions will just print out the information about our tag. We will later modify them so that we can acutally do something useful with our information. At this stage we just want to check to make sure that our parser is working correctly.
Starting up the parser
Now that the parser is setup and configured, we are ready to feed it our XML file and let it parse the information. This is the complicated part of the program, so extra attention is requested.
The first step is to open the xml file :
[php]$filename = “sample.xml”;
if (!($fp = fopen($filename, “r”))) { die(”cannot open “.$filename); }[/php]
This simple code will check to see if our program can open the file or not. It will quit with an appropriate message if it cannot.
Once the file is open, we must read it and feed it to the XML parser. One thing we are going to do before we send the file to the XML parser is we are going to get rid of any whitespace using a regular expression and the eregi_replace function :
[php]while ($data = fread($fp, 4096)){
$data=eregi_replace(”>”.”[[:space:]]+”.”< ",">< ",$data);
if (!xml_parse($xmlparser, $data, feof($fp))) {
$reason = xml_error_string(xml_get_error_code($xmlparser));
$reason .= xml_get_current_line_number($xmlparser);
die($reason);
}
}
xml_parser_free($xmlparser);[/php]
Lets step through this code :
- The fread() function reads the data from the xml file (given by the $fp handle), and stores it in $data.
- We use the eregi_replace function to get rid of the whitespace in $data
- We then check to see if the data was parsed or not, if it isn’t, we use the built-in xml error reporting functions to print out an informative error message.
- At the end, we free the parser (destory it)
Once we have verified that our parser is working properly, we are ready to actually do something with the data.
Creating the gallery
Now that we have verified that our parser works, we are ready to modify our parser to actually make use of our information. In order to do this, we only have to deal with our custom functions that handle the data.
Lets print out a nice little gallery using our images. Our gallery will just print the image with its dimentions, and a caption that is the name of the image. I will type out the modified functions, and then explain the code :
[php]$current = “”;
function start_tag($parser, $name, $attribs) {
global $current;
$current = $name;
if ($name == “IMAGEINFO”) { echo “
| “; } if ($name == “FILENAME”) { echo “ |
|
“; }
if ($name == “SIZE”) { if (is_array($attribs)) { while(list($key,$val) = each($attribs)) { echo strtolower($key).”=\”".$val.”\”"; } } } function end_tag($parser, $name) { if ($name == “NAME”) { echo “ |
“; }
}
function tag_contents($parser, $data) {
global $current;
if ($current == “FILENAME”) { echo $data; }
if ($current == “NAME”) { echo $data; }
}[/php]
Since the tag_contents() functions doesn’t get the name of the current tag from the parser, we have to manually provide it that information. In our start_tag() function, we set a global variable $current to the current tag name. The rest of the code is just checks to see which tag we are on, and print out the appropriate tags.
That’s it! Now you have a “skeleton” parser that you can modify to use with XML files (such as RSS feeds).
Notes
You’ll note that I am comparing tag names in upper case. The parser by default converts all tags to upper case. This behavior can be changed by passing arguments to the xml_parser_create() function.
Comments»
Thank You!!!
God, I’ve been looking around on the internet for over two days looking for a simple example of how to parse XML and output it to the browser using PHP and this was it!! I implemented it and it worked perfectly w/ my XML file.
Taking this one further, however, how would you build pagination into this example? For example, I’ve got an XML file that is huge, and I want to break the output into 5-6 pages so the user doesn’t have to scroll forever to see all the data. Any thoughts? Thanks for the great tutorial.
Well, by default the XML parser does a ‘one-shot’ parse, which means that it will parse the entire document, so there is no way to have it parse the document in bits (well, there is — but its really not the best approach to solve this problem).
My suggestion to you would be to store the parsed information in an array, or some other construct (perhaps you could store it in a database?) then you have access to all the functions that PHP provides for fast pagination.
If you want to take this XML parsing one step further, I recommend you explore the XML_Serializer class at pear. This class allows you to convert XML documents into native PHP types (such as arrays). Then you can only display parts of this array — passing around the index and the array in a session variable for multi-page parsing.
Hi Burhan, I concur with Coby. Thanks! I’m a very accomplished programmer, thats been putting off dealing with XML in PHP4. I happened across your tutorial and “got it”. - Jim
Hello,
In an effort to fulfill my boss’s request to build a page with up-to-date weather information, such as what is offered by weather underground, without there logo. Unfortunately, I am trained in ASP, not PHP so I’m have some difficulty “getting” this. I’m working through your tutorial but the first question I have is in regards to the first bit of PHP code (”Cannot create parser”), am I to put that into a new page? What steps do I need to take to check this? Furthermore, do the tag functions get placed in the same page? I’m feeling beyond stupid with this. Thanks for all of your help!
Hi Jeri:
I will try to answer your questions as best as I can, please let me know if you have any further questions.
Yes, you are to put this in a page. Actually, all the code should go into one page, I have just put it in there bit-by-bit to explain the different sections of it. Once you have written the first snippet of code, save the page and browse to it from the web. If you don’t get the error message, chances are you have the xml parsing functions available and you can proceed (assuming PHP has been setup properly).
Yes, the tag functions, along with everything else is to go on one page (otherwise, some functions will not work as they rely on bits of code from other sections).
However, please note that this tutorial is for PHP4, if you are using PHP5, there is another tutorial that you can check out (also on this site).
[...] Here is a good place to start: (link) Parsing XML using PHP4 [...]
hmmm i’m a bit confused.
the script works up until i add the function to create a gallery.
then i get this error:
Parse error: syntax error, unexpected $end in /home/kylieros/public_html/xml/gallery.php
Kayloe:
If you can post your code at a pastebin site (like http://www.pastebin.com) and reply back with the link, I will see what is the problem.
good work Burhan. Much appreciated.
Thanks Burhan,
PHP4 seems to like XML much less than PHP5. Us ASP boys need as much help as we can get
Great tutorial