Qt Learning Road 60 Using DOM to Process XML

兰博

Qt Learning Road 60 Using DOM to Process XML [Copy link]

DOM is a standard interface for processing XML documents proposed by W3C. Qt implements the DOM Level 2 method of reading and writing XML documents without validation. , Helvetica, SimSun, sans-serif]Different from the stream method mentioned in the previous chapter, DOM reads the entire XML document at one time and constructs it as a tree (called DOM tree) in memory. We can navigate the tree, such as moving to the next node or returning to the previous node, modify the tree, or save the tree directly to an XML file on disk. Consider the following XML fragment:

Scio me nihil scire

I know that I know nothing

, Helvetica, SimSun, sans-serif]We can think of it as the following DOM tree:

Document , Helvetica, sans-serif]The DOM tree shown above contains different types of nodes. For example, a node of type Element has a start tag and a corresponding end tag. The content between the start tag and the end tag is the child node of this Element node. In Qt, the type names of all DOM nodes start with QDom, so QDomElement is an Element node and QDomText is a Text node. Different types of nodes have different types of child nodes. For example, Element nodes are allowed to contain other Element nodes, and can also be of other types, such as EntityReference, Text, CDATASection, ProcessingInstruction and Comment. According to W3C regulations, we have the following inclusion rules:

[Document]
<- [Element]
<- DocumentType
<- ProcessingInstrument
<- Comment
[Attr]
<- [EntityReference]
<- Text
[DocumentFragment] | [Element] | [EntityReference] | [Entity]
<- [Element]
<- [EntityReference]
<- Text
<- CDATASection
<- ProcessingInstrument
<- Comment
[color=rgb(0, 204, 255) !important]Copy code
In the above table, those with [] can have child nodes, but not vice versa.

Below we still use the books.xml file listed in the previous chapter as an example. The purpose of the program is still the same: to use QTreeWidget to display the structure of this file. It should be noted that since we choose DOM to process XML, both Qt4 and Qt5 need to add the following sentence in the .pro file:
QT += xml
The header file is similar:

class MainWindow : public QMainWindow
{
Q_OBJECT
public:
MainWindow(QWidget *parent = 0);
~MainWindow();

bool readFile(const QString &fileName);
private:
void parseBookindexElement(const QDomElement &element);
void parseEntryElement(const QDomElement &element, QTreeWidgetItem *parent);
void parsePageElement(const QDomElement &element, QTreeWidgetItem *parent);
QTreeWidget *treeWidget;
};
[color=rgb(0, 204, 255) !important]Copy code[/ size]
The constructor and destructor of MainWindow and the above The chapter is the same, there is no difference:

MainWindow::MainWindow(QWidget *parent)
: QMainWindow(parent)
{
setWindowTitle(tr("XML DOM Reader"));

treeWidget = new QTreeWidget(this);
QStringList headers;
headers << "Items" << "Pages"; treeWidget->setHeaderLabels(headers);
setCentralWidget(treeWidget);
}

MainWindow::~MainWindow()
{ [ *]}
[color=rgb(0, 204, 255) !important]Copy code
readFile() The function has changed:

bool MainWindow::readFile(const QString &fileName)
{
QFile file(fileName);
if (!file.open(QFile::ReadOnly | QFile::Text)) {
QMessageBox::critical(this, tr("Error" ),
tr("Cannot read file %1").arg(fileName));
return false;
}

QString errorStr;
int errorLine;
int errorColumn;

QDomDocument doc;
if (!doc.setContent(&file, false, &errorStr, &errorLine,
&errorColumn)) {
QMessageBox::critical(this, tr("Error"),
tr("Parse error at line %1 , column %2: %3")
.arg(errorLine).arg(errorColumn).arg(errorStr));
return false;
}

QDomElement root = doc.documentElement();
if (root.tagName() != "bookindex") {
QMessageBox::critical(this, tr("Error "),
tr("Not a bookindex file"));
return false;
}

parseBookindexElement(root);
return true;
}
[color=rgb(0, 204, 255) !important]Copy code [p=22, null,left]The readFile() function is obviously longer and more complicated. First, you need to use QFile to open a file, which is no different. Then we create a QDomDocument object to represent the entire document. Note that the structure diagram we introduced above shows that Document is the root node of the DOM tree, which is the QDomDocument here; use its setContent() function to fill the DOM tree. 55555]

bool QDomDocument::setContent ( QIODevice * dev,
bool namespaceProcessing,
QString * errorMsg = 0,
int * errorLine = 0,
int * errorColumn = 0 )
[color=rgb(0, 204, 255) !important]复制代码
[font=Tahoma, bool QDomDocument::setContent(const QByteArray & data,
bool namespaceProcessing,
QString * errorMsg = 0,
int * errorLine = 0,
int * errorColumn = 0 )
[color=rgb(0, 204, 255) !important]Copy code
The parameters of the two functions are basically similar. The second function has five parameters. The first one is QByteArray, which is the real data read. This data can be obtained by QIODevice, and QFile is a subclass of QIODevice; the second parameter determines whether to process the namespace. If set to true, the processor will automatically set the tag prefix and so on. Because our XML document has no namespace, it is directly set to false; the remaining three parameters are about error handling. The last three parameters are all output parameters. We pass in a pointer, and the function will set the actual value of the pointer so that we can get it outside and process it further.

When the QDomDocument::setContent() function is called and there is no error, we call the QDomDocument::documentElement() function to get a Document element. If the Document element tag is bookindex, continue processing downward, otherwise an error is reported.

void MainWindow::parseBookindexElement(const QDomElement &element)
{
QDomNode child = element.firstChild();
while (!child.isNull()) {
if (child.toElement().tagName() == "entry") {
parseEntryElement(child.toElement(),
treeWidget->invisibleRootItem());
}
child = child.nextSibling();
}
}
[color=rgb(0, 204, 255) !important]Copy code
If the root tag is correct, we take the first child tag and determine whether the child tag is empty, that is, there is a child tag, and then determine whether its name is entry. If it is, it means that we are processing the entry tag, and call its own processing function; otherwise, take the next tag (that is, the return value of nextSibling()) to continue the judgment. Note that we use this if to select only the entry tag for processing, and ignore other tags directly. In addition, the return values of both firstChild() and nextSibling() functions are QDomNode. This is the base class of all node classes. When we need to operate on a node, we must convert it to the correct subclass. In this example, we use the toElement() function to convert QDomNode to QDomElement. If the conversion fails, the return value will be an empty QDomElement type, and its tagName() returns an empty string. If the judgment fails, it actually meets our requirements.

void MainWindow::parseEntryElement(const QDomElement &element,
QTreeWidgetItem *parent)
{
QTreeWidgetItem *item = new QTreeWidgetItem(parent);
item->setText(0, element.attribute("term"));

QDomNode child = element.firstChild();
while (!child.isNull()) {
if (child.toElement().tagName() == "entry") {
parseEntryElement(child.toElement(), item);
} else if , Helvetica, SimSun, sans-serif] Then we start to traverse the child tags of the entry tag. If it is an entry tag, it will recursively call itself and take the current node as the parent node; otherwise, it will call the parsePageElement() function.

void MainWindow::parsePageElement(const QDomElement &element,
QTreeWidgetItem *parent)
{
QString page = element.text();
QString allPages = parent->text(1);
if (!allPages.isEmpty()) {
allPages += ", ";
}
allPages += page;
parent->setText(1, allPages);
}
[color=rgb(0, 204, 255) !important]Copy code
parsePageElement() is relatively simple. We still set the text of the leaf node through string concatenation. This is roughly the same as the steps in the previous chapter.

The program running results are exactly the same as in the previous chapter, so no screenshots are posted here.

Through this example, we can see that when using DOM to process XML documents, except for the setContent() function at the beginning, the rest has nothing to do with the original document. In other words, after the setContent() function is called, a complete DOM tree has been built in memory, and we can move on this tree, such as taking adjacent nodes (nextSibling()). Compared with the stream method in the previous chapter, although we closed the file early, we always used readNext() to move down, and there is no function like readPrevious().