<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ars Longa, Vita Brevis &#187; XML</title>
	<atom:link href="http://blog.sjinks.pro/tag/xml/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.sjinks.pro</link>
	<description>Quod scripsi, scripsi</description>
	<lastBuildDate>Mon, 06 Feb 2012 17:56:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>HTML Parser для Qt</title>
		<link>http://blog.sjinks.pro/c-cpp/qt/942-html-parser-qt/</link>
		<comments>http://blog.sjinks.pro/c-cpp/qt/942-html-parser-qt/#comments</comments>
		<pubDate>Wed, 07 Sep 2011 08:03:07 +0000</pubDate>
		<dc:creator>Vladimir</dc:creator>
				<category><![CDATA[Qt]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[DOM]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blog.sjinks.pro/?p=942</guid>
		<description><![CDATA[Использование libxml2 для разбора документов HTML в Qt XML — это, конечно, хорошо, но очень часто требуется разбирать документы HTML, которые могут и не быть валидными. В Qt есть множество классов для работы с XML, но они не подходят для HTML, так как ошибки в HTML для них фатальны. Ниже приведён вариант парсера для HTML, основанный на [...]<p>© 2012 <a href="http://blog.sjinks.pro">Ars Longa, Vita Brevis</a>. Все права защищены. Перепубликация материалов без разрешения автора запрещена.</p>
<p>При использовании материалов блога наличие активной не закрытой от индексирования ссылки на <a href="http://blog.sjinks.pro/c-cpp/qt/942-html-parser-qt/">источник</a> обязательно.</p>]]></description>
			<content:encoded><![CDATA[<h2><em>Использование libxml2 для разбора документов <a href="http://blog.sjinks.pro/tag/html/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  HTML">HTML</a> в <a href="http://blog.sjinks.pro/tag/qt/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  Qt">Qt</a></em></h2>
<p><a href="http://blog.sjinks.pro/tag/xml/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  XML">XML</a> — это, конечно, хорошо, но очень часто требуется разбирать документы HTML, которые могут и не быть валидными.</p>
<p>В Qt есть множество классов для работы с XML, но они не подходят для HTML, так как ошибки в HTML для них фатальны.</p>
<p>Ниже приведён вариант парсера для HTML, основанный на библиотеке libxml2.<span id="more-942"></span></p>
          
<div class="codebox">
    <div class="the_code" style="" id="p9425">
        <div class="code cpp-qt" id="p942code5">
<span class="co2">#ifndef LIBXML2READER_H</span><br />
<span class="co2">#define LIBXML2READER_H</span><br />
<br />
<span class="co2">#include &lt;QtXml/QXmlReader&gt;</span><br />
<span class="co2">#include &lt;libxml/xmlstring.h&gt;</span><br />
<br />
<span class="kw2">class</span> LibXml2ReaderPrivate<span class="sy0">;</span><br />
<br />
<span class="kw2">class</span> LibXml2Reader <span class="sy0">:</span> <span class="kw2">public</span> <span class="kw5">QXmlReader</span> <span class="br0">&#123;</span><br />
<span class="kw2">public</span><span class="sy0">:</span><br />
&nbsp; &nbsp; LibXml2Reader<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> ~LibXml2Reader<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">bool</span> feature<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span> name<span class="sy0">,</span> <span class="kw4">bool</span><span class="sy0">*</span> ok <span class="sy0">=</span> 0<span class="br0">&#41;</span> const<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">void</span> setFeature<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span> name<span class="sy0">,</span> <span class="kw4">bool</span> value<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">bool</span> hasFeature<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span> name<span class="br0">&#41;</span> const<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">void</span><span class="sy0">*</span> property<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span> name<span class="sy0">,</span> <span class="kw4">bool</span><span class="sy0">*</span> ok <span class="sy0">=</span> 0<span class="br0">&#41;</span> const<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">void</span> setProperty<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span> name<span class="sy0">,</span> <span class="kw4">void</span><span class="sy0">*</span> value<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">bool</span> hasProperty<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span> name<span class="br0">&#41;</span> const<span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">void</span> setEntityResolver<span class="br0">&#40;</span><span class="kw5">QXmlEntityResolver</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw5">QXmlEntityResolver</span><span class="sy0">*</span> entityResolver<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> const<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">void</span> setDTDHandler<span class="br0">&#40;</span><span class="kw5">QXmlDTDHandler</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw5">QXmlDTDHandler</span><span class="sy0">*</span> DTDHandler<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> const<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">void</span> setContentHandler<span class="br0">&#40;</span><span class="kw5">QXmlContentHandler</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw5">QXmlContentHandler</span><span class="sy0">*</span> contentHandler<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> const<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">void</span> setErrorHandler<span class="br0">&#40;</span><span class="kw5">QXmlErrorHandler</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw5">QXmlErrorHandler</span><span class="sy0">*</span> errorHandler<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> const<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">void</span> setLexicalHandler<span class="br0">&#40;</span><span class="kw5">QXmlLexicalHandler</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw5">QXmlLexicalHandler</span><span class="sy0">*</span> lexicalHandler<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> const<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">void</span> setDeclHandler<span class="br0">&#40;</span><span class="kw5">QXmlDeclHandler</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw5">QXmlDeclHandler</span><span class="sy0">*</span> declHandler<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> const<span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">bool</span> parse<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QXmlInputSource</span><span class="sy0">&amp;</span> input<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">bool</span> parse<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QXmlInputSource</span><span class="sy0">*</span> input<span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
<br />
<span class="kw2">private</span><span class="sy0">:</span><br />
&nbsp; &nbsp; Q_DISABLE_COPY<span class="br0">&#40;</span>LibXml2Reader<span class="br0">&#41;</span><br />
&nbsp; &nbsp; Q_DECLARE_PRIVATE<span class="br0">&#40;</span>LibXml2Reader<span class="br0">&#41;</span><br />
&nbsp; &nbsp; QScopedPointer<span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">&gt;</span> d_ptr<span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw2">friend</span> <span class="kw2">class</span> LibXml2ReaderLocator<span class="sy0">;</span><br />
<span class="br0">&#125;</span><span class="sy0">;</span><br />
<br />
<span class="co2">#endif // LIBXML2READER_H</span>
        </div>
    </div>
</div>

          
<div class="codebox">
    <div class="the_code" style="" id="p9426">
        <div class="code cpp-qt" id="p942code6">
<span class="co2">#include &quot;libxml2reader.h&quot;</span><br />
<span class="co2">#include &lt;cstring&gt;</span><br />
<span class="co2">#include &lt;libxml/tree.h&gt;</span><br />
<span class="co2">#include &lt;libxml/parser.h&gt;</span><br />
<span class="co2">#include &lt;libxml/HTMLparser.h&gt;</span><br />
<br />
<span class="kw2">class</span> LibXml2ReaderLocator <span class="sy0">:</span> <span class="kw2">public</span> <span class="kw5">QXmlLocator</span> <span class="br0">&#123;</span><br />
<span class="kw2">public</span><span class="sy0">:</span><br />
&nbsp; &nbsp; LibXml2ReaderLocator<span class="br0">&#40;</span>LibXml2Reader<span class="sy0">*</span> r<span class="br0">&#41;</span> <span class="sy0">:</span> reader<span class="br0">&#40;</span>r<span class="br0">&#41;</span> <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">int</span> columnNumber<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> const<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw2">virtual</span> <span class="kw4">int</span> lineNumber<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> const<span class="sy0">;</span><br />
<span class="kw2">private</span><span class="sy0">:</span><br />
&nbsp; &nbsp; LibXml2Reader<span class="sy0">*</span> reader<span class="sy0">;</span><br />
<span class="br0">&#125;</span><span class="sy0">;</span><br />
<br />
<span class="kw2">class</span> LibXml2ReaderPrivate <span class="br0">&#123;</span><br />
<span class="kw2">public</span><span class="sy0">:</span><br />
&nbsp; &nbsp; ~LibXml2ReaderPrivate<span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
<span class="kw2">private</span><span class="sy0">:</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="br0">&#40;</span>LibXml2Reader<span class="sy0">*</span> reader<span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw4">static</span> <span class="kw4">void</span> startDocument<span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">static</span> <span class="kw4">void</span> endDocument<span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">static</span> <span class="kw4">void</span> startElement<span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> name<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">**</span> attrs<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">static</span> <span class="kw4">void</span> endElement<span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> name<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">static</span> <span class="kw4">void</span> comment<span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> value<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">static</span> <span class="kw4">void</span> cdataBlock<span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> value<span class="sy0">,</span> <span class="kw4">int</span> len<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">static</span> <span class="kw4">void</span> processingInstruction<span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> target<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> data<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">static</span> <span class="kw4">void</span> characters<span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> ch<span class="sy0">,</span> <span class="kw4">int</span> len<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">static</span> <span class="kw4">void</span> ignorableWhitespace<span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> ch<span class="sy0">,</span> <span class="kw4">int</span> len<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">static</span> <span class="kw4">void</span> internalSubset<span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> name<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> publicId<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> systemId<span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw4">void</span> parse<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QXmlInputSource</span><span class="sy0">*</span> input<span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; QScopedPointer<span class="sy0">&lt;</span>LibXml2ReaderLocator<span class="sy0">&gt;</span> locator<span class="sy0">;</span><br />
&nbsp; &nbsp; Q_DECLARE_PUBLIC<span class="br0">&#40;</span>LibXml2Reader<span class="br0">&#41;</span><br />
&nbsp; &nbsp; LibXml2Reader<span class="sy0">*</span> q_ptr<span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw5">QXmlEntityResolver</span><span class="sy0">*</span> entityresolver<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw5">QXmlDTDHandler</span><span class="sy0">*</span> &nbsp; &nbsp; dtdhandler<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw5">QXmlContentHandler</span><span class="sy0">*</span> contenthandler<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw5">QXmlErrorHandler</span><span class="sy0">*</span> &nbsp; errorhandler<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw5">QXmlLexicalHandler</span><span class="sy0">*</span> lexicalhandler<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw5">QXmlDeclHandler</span><span class="sy0">*</span> &nbsp; &nbsp;declhandler<span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; xmlParserCtxt<span class="sy0">*</span> context<span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw2">friend</span> <span class="kw2">class</span> LibXml2ReaderLocator<span class="sy0">;</span><br />
<span class="br0">&#125;</span><span class="sy0">;</span><br />
<br />
LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">LibXml2ReaderPrivate</span><span class="br0">&#40;</span>LibXml2Reader<span class="sy0">*</span> reader<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="sy0">:</span> q_ptr<span class="br0">&#40;</span>reader<span class="br0">&#41;</span><span class="sy0">,</span> entityresolver<span class="br0">&#40;</span>0<span class="br0">&#41;</span><span class="sy0">,</span> dtdhandler<span class="br0">&#40;</span>0<span class="br0">&#41;</span><span class="sy0">,</span> contenthandler<span class="br0">&#40;</span>0<span class="br0">&#41;</span><span class="sy0">,</span> errorhandler<span class="br0">&#40;</span>0<span class="br0">&#41;</span><span class="sy0">,</span> lexicalhandler<span class="br0">&#40;</span>0<span class="br0">&#41;</span><span class="sy0">,</span> declhandler<span class="br0">&#40;</span>0<span class="br0">&#41;</span><span class="sy0">,</span> context<span class="br0">&#40;</span>0<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; this<span class="sy0">-&gt;</span><span class="me3">locator</span>.<span class="me1">reset</span><span class="br0">&#40;</span><span class="kw1">new</span> LibXml2ReaderLocator<span class="br0">&#40;</span>reader<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">parse</span><span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QXmlInputSource</span><span class="sy0">*</span> input<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; htmlSAXHandler handler<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw5">QByteArray</span> arr <span class="sy0">=</span> input<span class="sy0">-&gt;</span><span class="me3">data</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">toLocal8Bit</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*</span> data <span class="sy0">=</span> arr.<span class="me1">data</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; std<span class="sy0">::</span><span class="kw3">memset</span><span class="br0">&#40;</span><span class="sy0">&amp;</span>handler<span class="sy0">,</span> 0<span class="sy0">,</span> <span class="kw3">sizeof</span><span class="br0">&#40;</span>handler<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; handler.<span class="me1">startDocument</span> &nbsp; &nbsp; &nbsp; &nbsp; <span class="sy0">=</span> <span class="sy0">&amp;</span>LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">startDocument</span><span class="sy0">;</span><br />
&nbsp; &nbsp; handler.<span class="me1">endDocument</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sy0">=</span> <span class="sy0">&amp;</span>LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">endDocument</span><span class="sy0">;</span><br />
&nbsp; &nbsp; handler.<span class="me1">startElement</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sy0">=</span> <span class="sy0">&amp;</span>LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">startElement</span><span class="sy0">;</span><br />
&nbsp; &nbsp; handler.<span class="me1">endElement</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sy0">=</span> <span class="sy0">&amp;</span>LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">endElement</span><span class="sy0">;</span><br />
&nbsp; &nbsp; handler.<span class="me1">comment</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sy0">=</span> <span class="sy0">&amp;</span>LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">comment</span><span class="sy0">;</span><br />
&nbsp; &nbsp; handler.<span class="me1">cdataBlock</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sy0">=</span> <span class="sy0">&amp;</span>LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">cdataBlock</span><span class="sy0">;</span><br />
&nbsp; &nbsp; handler.<span class="me1">processingInstruction</span> <span class="sy0">=</span> <span class="sy0">&amp;</span>LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">processingInstruction</span><span class="sy0">;</span><br />
&nbsp; &nbsp; handler.<span class="me1">characters</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sy0">=</span> <span class="sy0">&amp;</span>LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">characters</span><span class="sy0">;</span><br />
&nbsp; &nbsp; handler.<span class="me1">ignorableWhitespace</span> &nbsp; <span class="sy0">=</span> <span class="sy0">&amp;</span>LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">ignorableWhitespace</span><span class="sy0">;</span><br />
&nbsp; &nbsp; handler.<span class="me1">internalSubset</span> &nbsp; &nbsp; &nbsp; &nbsp;<span class="sy0">=</span> <span class="sy0">&amp;</span>LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">internalSubset</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; this<span class="sy0">-&gt;</span><span class="me3">context</span> <span class="sy0">=</span> htmlCreatePushParserCtxt<span class="br0">&#40;</span><span class="sy0">&amp;</span>handler<span class="sy0">,</span> <span class="kw1">this</span><span class="sy0">,</span> data<span class="sy0">,</span> xmlStrlen<span class="br0">&#40;</span><span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> xmlChar<span class="sy0">*&gt;</span><span class="br0">&#40;</span>data<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">,</span> <span class="st0">&quot;&quot;</span><span class="sy0">,</span> XML_CHAR_ENCODING_NONE<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; htmlParseChunk<span class="br0">&#40;</span>this<span class="sy0">-&gt;</span><span class="me3">context</span><span class="sy0">,</span> <span class="kw2">NULL</span><span class="sy0">,</span> 0<span class="sy0">,</span> 1<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; htmlFreeParserCtxt<span class="br0">&#40;</span>this<span class="sy0">-&gt;</span><span class="me3">context</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; xmlCleanupParser<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">startDocument</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="sy0">*</span> r <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">*&gt;</span><span class="br0">&#40;</span>c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="sy0">-&gt;</span><span class="me3">startDocument</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">endDocument</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="sy0">*</span> r <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">*&gt;</span><span class="br0">&#40;</span>c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="sy0">-&gt;</span><span class="me3">endDocument</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">startElement</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> name<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">**</span> attrs<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="sy0">*</span> r <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">*&gt;</span><span class="br0">&#40;</span>c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QXmlAttributes</span> a<span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw4">int</span> i <span class="sy0">=</span> <span class="nu0">0</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>attrs<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">while</span> <span class="br0">&#40;</span>attrs<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*</span> name &nbsp;<span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>attrs<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*</span> value <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>attrs<span class="br0">&#91;</span>i<span class="sy0">+</span>1<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; i <span class="sy0">+=</span> <span class="nu0">2</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; a.<span class="me1">append</span><span class="br0">&#40;</span>name<span class="sy0">,</span> <span class="st0">&quot;&quot;</span><span class="sy0">,</span> <span class="st0">&quot;&quot;</span><span class="sy0">,</span> value ? value <span class="sy0">:</span> name<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QString</span> uri <span class="sy0">=</span> <span class="st0">&quot;&quot;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QString</span> localName <span class="sy0">=</span> <span class="st0">&quot;&quot;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QString</span> qName <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>name<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="sy0">-&gt;</span><span class="me3">startElement</span><span class="br0">&#40;</span>uri<span class="sy0">,</span> localName<span class="sy0">,</span> qName<span class="sy0">,</span> a<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">endElement</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> name<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="sy0">*</span> r <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">*&gt;</span><span class="br0">&#40;</span>c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="sy0">-&gt;</span><span class="me3">endElement</span><span class="br0">&#40;</span><span class="kw5">QString</span><span class="br0">&#40;</span><span class="st0">&quot;&quot;</span><span class="br0">&#41;</span><span class="sy0">,</span> <span class="kw5">QString</span><span class="br0">&#40;</span><span class="st0">&quot;&quot;</span><span class="br0">&#41;</span><span class="sy0">,</span> <span class="kw5">QString</span><span class="br0">&#40;</span><span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>name<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">comment</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> value<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="sy0">*</span> r <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">*&gt;</span><span class="br0">&#40;</span>c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">lexicalhandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">lexicalhandler</span><span class="sy0">-&gt;</span><span class="me3">comment</span><span class="br0">&#40;</span><span class="kw5">QString</span><span class="sy0">::</span><span class="me2">fromLocal8Bit</span><span class="br0">&#40;</span><span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>value<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">cdataBlock</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> value<span class="sy0">,</span> <span class="kw4">int</span> len<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="sy0">*</span> r <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">*&gt;</span><span class="br0">&#40;</span>c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">lexicalhandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">lexicalhandler</span><span class="sy0">-&gt;</span><span class="me3">startCDATA</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QByteArray</span> arr<span class="br0">&#40;</span><span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>value<span class="br0">&#41;</span><span class="sy0">,</span> len<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="sy0">-&gt;</span><span class="me3">characters</span><span class="br0">&#40;</span>arr<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">lexicalhandler</span><span class="sy0">-&gt;</span><span class="me3">endCDATA</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">processingInstruction</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> target<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> data<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="sy0">*</span> r <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">*&gt;</span><span class="br0">&#40;</span>c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="sy0">-&gt;</span><span class="me3">processingInstruction</span><span class="br0">&#40;</span><span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>target<span class="br0">&#41;</span><span class="sy0">,</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>data<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">characters</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> ch<span class="sy0">,</span> <span class="kw4">int</span> len<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="sy0">*</span> r <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">*&gt;</span><span class="br0">&#40;</span>c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="sy0">-&gt;</span><span class="me3">characters</span><span class="br0">&#40;</span><span class="kw5">QString</span><span class="sy0">::</span><span class="me2">fromLocal8Bit</span><span class="br0">&#40;</span><span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>ch<span class="br0">&#41;</span><span class="sy0">,</span> len<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">ignorableWhitespace</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> ch<span class="sy0">,</span> <span class="kw4">int</span> len<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="sy0">*</span> r <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">*&gt;</span><span class="br0">&#40;</span>c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="sy0">-&gt;</span><span class="me3">ignorableWhitespace</span><span class="br0">&#40;</span><span class="kw5">QString</span><span class="sy0">::</span><span class="me2">fromLocal8Bit</span><span class="br0">&#40;</span><span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>ch<span class="br0">&#41;</span><span class="sy0">,</span> len<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2ReaderPrivate<span class="sy0">::</span><span class="me2">internalSubset</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="sy0">*</span> c<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> name<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> publicId<span class="sy0">,</span> <span class="kw4">const</span> xmlChar<span class="sy0">*</span> systemId<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; LibXml2ReaderPrivate<span class="sy0">*</span> r <span class="sy0">=</span> <span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span>LibXml2ReaderPrivate<span class="sy0">*&gt;</span><span class="br0">&#40;</span>c<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>r<span class="sy0">-&gt;</span><span class="me3">lexicalhandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QString</span> n<span class="br0">&#40;</span><span class="kw5">QString</span><span class="sy0">::</span><span class="me2">fromLocal8Bit</span><span class="br0">&#40;</span><span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>name<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QString</span> p<span class="br0">&#40;</span><span class="kw5">QString</span><span class="sy0">::</span><span class="me2">fromLocal8Bit</span><span class="br0">&#40;</span><span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>publicId<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QString</span> s<span class="br0">&#40;</span><span class="kw5">QString</span><span class="sy0">::</span><span class="me2">fromLocal8Bit</span><span class="br0">&#40;</span><span class="kw2">reinterpret_cast</span><span class="sy0">&lt;</span><span class="kw4">const</span> <span class="kw4">char</span><span class="sy0">*&gt;</span><span class="br0">&#40;</span>systemId<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">lexicalhandler</span><span class="sy0">-&gt;</span><span class="me3">startDTD</span><span class="br0">&#40;</span>n<span class="sy0">,</span> p<span class="sy0">,</span> s<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span class="sy0">-&gt;</span><span class="me3">lexicalhandler</span><span class="sy0">-&gt;</span><span class="me3">endDTD</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
<span class="br0">&#125;</span><br />
<br />
LibXml2Reader<span class="sy0">::</span><span class="me2">LibXml2Reader</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="sy0">:</span> d_ptr<span class="br0">&#40;</span><span class="kw1">new</span> LibXml2ReaderPrivate<span class="br0">&#40;</span><span class="kw1">this</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
<span class="br0">&#125;</span><br />
<br />
LibXml2Reader<span class="sy0">::</span><span class="me2">~LibXml2Reader</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">bool</span> LibXml2Reader<span class="sy0">::</span><span class="me2">feature</span><span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;,</span> <span class="kw4">bool</span><span class="sy0">*</span> ok<span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>ok<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sy0">*</span>ok <span class="sy0">=</span> false<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; <span class="kw1">return</span> false<span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2Reader<span class="sy0">::</span><span class="me2">setFeature</span><span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;,</span> <span class="kw4">bool</span><span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">bool</span> LibXml2Reader<span class="sy0">::</span><span class="me2">hasFeature</span><span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span><span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> false<span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span><span class="sy0">*</span> LibXml2Reader<span class="sy0">::</span><span class="me2">property</span><span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;,</span> <span class="kw4">bool</span><span class="sy0">*</span> ok<span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>ok<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sy0">*</span>ok <span class="sy0">=</span> false<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">0</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2Reader<span class="sy0">::</span><span class="me2">setProperty</span><span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;,</span> <span class="kw4">void</span><span class="sy0">*</span><span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">bool</span> LibXml2Reader<span class="sy0">::</span><span class="me2">hasProperty</span><span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span><span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> false<span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2Reader<span class="sy0">::</span><span class="me2">setEntityResolver</span><span class="br0">&#40;</span><span class="kw5">QXmlEntityResolver</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; Q_D<span class="br0">&#40;</span>LibXml2Reader<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; d<span class="sy0">-&gt;</span><span class="me3">entityresolver</span> <span class="sy0">=</span> handler<span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw5">QXmlEntityResolver</span><span class="sy0">*</span> LibXml2Reader<span class="sy0">::</span><span class="me2">entityResolver</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw4">const</span> LibXml2ReaderPrivate<span class="sy0">*</span> d <span class="sy0">=</span> this<span class="sy0">-&gt;</span><span class="me3">d_func</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> d<span class="sy0">-&gt;</span><span class="me3">entityresolver</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2Reader<span class="sy0">::</span><span class="me2">setDTDHandler</span><span class="br0">&#40;</span><span class="kw5">QXmlDTDHandler</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; Q_D<span class="br0">&#40;</span>LibXml2Reader<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; d<span class="sy0">-&gt;</span><span class="me3">dtdhandler</span> <span class="sy0">=</span> handler<span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw5">QXmlDTDHandler</span><span class="sy0">*</span> LibXml2Reader<span class="sy0">::</span><span class="me2">DTDHandler</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw4">const</span> LibXml2ReaderPrivate<span class="sy0">*</span> d <span class="sy0">=</span> this<span class="sy0">-&gt;</span><span class="me3">d_func</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> d<span class="sy0">-&gt;</span><span class="me3">dtdhandler</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2Reader<span class="sy0">::</span><span class="me2">setContentHandler</span><span class="br0">&#40;</span><span class="kw5">QXmlContentHandler</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; Q_D<span class="br0">&#40;</span>LibXml2Reader<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; d<span class="sy0">-&gt;</span><span class="me3">contenthandler</span> <span class="sy0">=</span> handler<span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw5">QXmlContentHandler</span><span class="sy0">*</span> LibXml2Reader<span class="sy0">::</span><span class="me2">contentHandler</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw4">const</span> LibXml2ReaderPrivate<span class="sy0">*</span> d <span class="sy0">=</span> this<span class="sy0">-&gt;</span><span class="me3">d_func</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> d<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2Reader<span class="sy0">::</span><span class="me2">setErrorHandler</span><span class="br0">&#40;</span><span class="kw5">QXmlErrorHandler</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; Q_D<span class="br0">&#40;</span>LibXml2Reader<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; d<span class="sy0">-&gt;</span><span class="me3">errorhandler</span> <span class="sy0">=</span> handler<span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw5">QXmlErrorHandler</span><span class="sy0">*</span> LibXml2Reader<span class="sy0">::</span><span class="me2">errorHandler</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw4">const</span> LibXml2ReaderPrivate<span class="sy0">*</span> d <span class="sy0">=</span> this<span class="sy0">-&gt;</span><span class="me3">d_func</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> d<span class="sy0">-&gt;</span><span class="me3">errorhandler</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2Reader<span class="sy0">::</span><span class="me2">setLexicalHandler</span><span class="br0">&#40;</span><span class="kw5">QXmlLexicalHandler</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; Q_D<span class="br0">&#40;</span>LibXml2Reader<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; d<span class="sy0">-&gt;</span><span class="me3">lexicalhandler</span> <span class="sy0">=</span> handler<span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw5">QXmlLexicalHandler</span><span class="sy0">*</span> LibXml2Reader<span class="sy0">::</span><span class="me2">lexicalHandler</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw4">const</span> LibXml2ReaderPrivate<span class="sy0">*</span> d <span class="sy0">=</span> this<span class="sy0">-&gt;</span><span class="me3">d_func</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> d<span class="sy0">-&gt;</span><span class="me3">lexicalhandler</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">void</span> LibXml2Reader<span class="sy0">::</span><span class="me2">setDeclHandler</span><span class="br0">&#40;</span><span class="kw5">QXmlDeclHandler</span><span class="sy0">*</span> handler<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; Q_D<span class="br0">&#40;</span>LibXml2Reader<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; d<span class="sy0">-&gt;</span><span class="me3">declhandler</span> <span class="sy0">=</span> handler<span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw5">QXmlDeclHandler</span><span class="sy0">*</span> LibXml2Reader<span class="sy0">::</span><span class="me2">declHandler</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw4">const</span> LibXml2ReaderPrivate<span class="sy0">*</span> d <span class="sy0">=</span> this<span class="sy0">-&gt;</span><span class="me3">d_func</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> d<span class="sy0">-&gt;</span><span class="me3">declhandler</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">bool</span> LibXml2Reader<span class="sy0">::</span><span class="me2">parse</span><span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QXmlInputSource</span><span class="sy0">&amp;</span> input<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> this<span class="sy0">-&gt;</span><span class="me3">parse</span><span class="br0">&#40;</span><span class="sy0">&amp;</span>input<span class="br0">&#41;</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">bool</span> LibXml2Reader<span class="sy0">::</span><span class="me2">parse</span><span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QXmlInputSource</span><span class="sy0">*</span> input<span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; Q_D<span class="br0">&#40;</span>LibXml2Reader<span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>d<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; d<span class="sy0">-&gt;</span><span class="me3">contenthandler</span><span class="sy0">-&gt;</span><span class="me3">setDocumentLocator</span><span class="br0">&#40;</span>d<span class="sy0">-&gt;</span><span class="me3">locator</span>.<span class="me1">data</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; d<span class="sy0">-&gt;</span><span class="me3">parse</span><span class="br0">&#40;</span>input<span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw1">return</span> true<span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">int</span> LibXml2ReaderLocator<span class="sy0">::</span><span class="me2">columnNumber</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> this<span class="sy0">-&gt;</span><span class="me3">reader</span><span class="sy0">-&gt;</span><span class="me3">d_func</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">-&gt;</span><span class="me3">context</span><span class="sy0">-&gt;</span><span class="me3">input</span><span class="sy0">-&gt;</span><span class="me3">col</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<br />
<span class="kw4">int</span> LibXml2ReaderLocator<span class="sy0">::</span><span class="me2">lineNumber</span><span class="br0">&#40;</span><span class="kw4">void</span><span class="br0">&#41;</span> <span class="kw4">const</span><br />
<span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> this<span class="sy0">-&gt;</span><span class="me3">reader</span><span class="sy0">-&gt;</span><span class="me3">d_func</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">-&gt;</span><span class="me3">context</span><span class="sy0">-&gt;</span><span class="me3">input</span><span class="sy0">-&gt;</span><span class="me3">line</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span>
        </div>
    </div>
</div>

<p>Использование:</p>
          
<div class="codebox">
    <div class="the_code" style="" id="p9427">
        <div class="code cpp-qt" id="p942code7">
&nbsp; &nbsp; <span class="kw5">QByteArray</span> data<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw5">QXmlInputSource</span> src<span class="sy0">;</span><br />
&nbsp; &nbsp; LibXml2Reader reader<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw5">QDomDocument</span> doc<span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="coMULTI">/* здесь читаем данные в data */</span><br />
<br />
&nbsp; &nbsp; src.<span class="me1">setData</span><span class="br0">&#40;</span>data<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; doc.<span class="me1">setContent</span><span class="br0">&#40;</span><span class="sy0">&amp;</span>src<span class="sy0">,</span> <span class="sy0">&amp;</span>reader<span class="br0">&#41;</span><span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="coMULTI">/* doc будет содержать дерево DOM, построенное из документа HTML */</span>
        </div>
    </div>
</div>

<p>В файл проекта нужно будет добавить две строки:</p>
          
<div class="codebox">
    <div class="the_code" style="" id="p9428">
        <div class="code text" id="p942code8">
INCLUDEPATH += /usr/include/libxml2<br />
LIBS &nbsp; &nbsp; &nbsp; &nbsp;+= -lxml2
        </div>
    </div>
</div>

<p>В <code>INCLUDEPATH</code> помещается путь к заголовочным файлам libxml2, в <code>LIBS</code> — опции компилятора для подключения библиотеки libxml2.</p>
<p>Скачать:</p>
<ul>
<li><a href='http://static.sjinks.info/wp-content/uploads/2011/09/libxml2reader.h'>libxml2reader.h</a></li>
<li><a href='http://static.sjinks.info/wp-content/uploads/2011/09/libxml2reader.cpp'>libxml2reader.cpp</a></li>
</ul>
<p>© 2012 <a href="http://blog.sjinks.pro">Ars Longa, Vita Brevis</a>. Все права защищены. Перепубликация материалов без разрешения автора запрещена.</p>
<p>При использовании материалов блога наличие активной не закрытой от индексирования ссылки на <a href="http://blog.sjinks.pro/c-cpp/qt/942-html-parser-qt/">источник</a> обязательно.</p>]]></content:encoded>
			<wfw:commentRss>http://blog.sjinks.pro/c-cpp/qt/942-html-parser-qt/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Пример рабочей реализации QDomDocument::elementById</title>
		<link>http://blog.sjinks.pro/c-cpp/qt/906-working-qdomdocument-elementbyid/</link>
		<comments>http://blog.sjinks.pro/c-cpp/qt/906-working-qdomdocument-elementbyid/#comments</comments>
		<pubDate>Sat, 19 Mar 2011 13:52:03 +0000</pubDate>
		<dc:creator>Vladimir</dc:creator>
				<category><![CDATA[Qt]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[DOM]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XPath]]></category>

		<guid isPermaLink="false">http://blog.sjinks.pro/?p=906</guid>
		<description><![CDATA[Избавляемся от ошибки «elementById() is not implemented and will always return a null node» Так сложилось, что в Qt реализация метода QDomDocument::elementById() нерабочая: при попытке использования данного метода выдаётся предупреждение elementById() is not implemented and will always return a null node и возвращается пустой элемент DOM. Временами это очень неудобно: например, вместо использования XPath из [...]<p>© 2012 <a href="http://blog.sjinks.pro">Ars Longa, Vita Brevis</a>. Все права защищены. Перепубликация материалов без разрешения автора запрещена.</p>
<p>При использовании материалов блога наличие активной не закрытой от индексирования ссылки на <a href="http://blog.sjinks.pro/c-cpp/qt/906-working-qdomdocument-elementbyid/">источник</a> обязательно.</p>]]></description>
			<content:encoded><![CDATA[<h2><em>Избавляемся от ошибки «elementById() is not implemented and will always return a null node»</em></h2>
<p>Так <a href="http://doc.trolltech.com/4.7/qdomdocument.html#elementById">сложилось</a>, что в <a href="http://blog.sjinks.pro/tag/qt/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  Qt">Qt</a> реализация метода <code>QDomDocument::elementById()</code> нерабочая: при попытке использования данного метода выдаётся предупреждение <strong>elementById() is not implemented and will always return a null node</strong> и возвращается пустой элемент <a href="http://blog.sjinks.pro/tag/dom/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  DOM">DOM</a>.</p>
<p>Временами это очень неудобно: например, вместо использования <a href="http://blog.sjinks.pro/tag/xpath/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  XPath">XPath</a> из QXmlPatterns может быть проще получить элемент DOM по его <code>id</code> и пройтись по его потомкам. А при использовании XPath функцию <code>id()</code> использовать не получится в силу тех же причин.<span id="more-906"></span></p>
<p>Ниже приведён пример рабочей реализации <code>QDomDocument::elementById()</code>.</p>
          
<div class="codebox">
    <div class="the_code" style="" id="p90610">
        <div class="code cpp-qt" id="p906code10">
<span class="co2">#include &lt;QtXml/QDomDocument&gt;</span><br />
<span class="co2">#include &lt;QMap&gt;</span><br />
<br />
<span class="kw2">class</span> QMyDomDocument <span class="sy0">:</span> <span class="kw2">public</span> <span class="kw5">QDomDocument</span> <span class="br0">&#123;</span><br />
<span class="kw2">public</span><span class="sy0">:</span><br />
&nbsp; &nbsp; QMyDomDocument<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="sy0">:</span> <span class="kw5">QDomDocument</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; QMyDomDocument<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span> name<span class="br0">&#41;</span> <span class="sy0">:</span> <span class="kw5">QDomDocument</span><span class="br0">&#40;</span>name<span class="br0">&#41;</span> <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; QMyDomDocument<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QDomDocumentType</span><span class="sy0">&amp;</span> doctype<span class="br0">&#41;</span> <span class="sy0">:</span> <span class="kw5">QDomDocument</span><span class="br0">&#40;</span>doctype<span class="br0">&#41;</span> <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; <span class="kw5">QDomElement</span> elementById<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span> id<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>map.<span class="me1">contains</span><span class="br0">&#40;</span>id<span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QDomElement</span> e <span class="sy0">=</span> map<span class="br0">&#91;</span>id<span class="br0">&#93;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>e.<span class="me1">parentNode</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">nodeType</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="sy0">!=</span> <span class="kw5">QDomNode</span><span class="sy0">::</span><span class="me2">BaseNode</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> e<span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; map.<span class="kw3">remove</span><span class="br0">&#40;</span>id<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw4">bool</span> res <span class="sy0">=</span> this<span class="sy0">-&gt;</span><span class="me3">find</span><span class="br0">&#40;</span>this<span class="sy0">-&gt;</span><span class="me3">documentElement</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">,</span> id<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>res<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> map<span class="br0">&#91;</span>id<span class="br0">&#93;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw5">QDomElement</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
<span class="kw2">private</span><span class="sy0">:</span><br />
&nbsp; &nbsp; <span class="kw5">QMap</span><span class="sy0">&lt;</span><span class="kw5">QString</span><span class="sy0">,</span> <span class="kw5">QDomElement</span><span class="sy0">&gt;</span> map<span class="sy0">;</span><br />
<br />
&nbsp; &nbsp; <span class="kw4">bool</span> find<span class="br0">&#40;</span><span class="kw5">QDomElement</span> node<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw5">QString</span><span class="sy0">&amp;</span> id<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>node.<span class="me1">hasAttribute</span><span class="br0">&#40;</span><span class="st0">&quot;id&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QString</span> value <span class="sy0">=</span> node.<span class="me1">attribute</span><span class="br0">&#40;</span><span class="st0">&quot;id&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; this<span class="sy0">-&gt;</span><span class="me3">map</span><span class="br0">&#91;</span>value<span class="br0">&#93;</span> <span class="sy0">=</span> node<span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>value <span class="sy0">==</span> id<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> true<span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span><span class="kw4">unsigned</span> <span class="kw4">int</span> i<span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span> i<span class="sy0">&lt;</span>node.<span class="me1">childNodes</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">length</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span> <span class="sy0">++</span>i<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw5">QDomNode</span> n <span class="sy0">=</span> node.<span class="me1">childNodes</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">at</span><span class="br0">&#40;</span>i<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>n.<span class="me1">isElement</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw4">bool</span> res <span class="sy0">=</span> this<span class="sy0">-&gt;</span><span class="me3">find</span><span class="br0">&#40;</span>n.<span class="me1">toElement</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">,</span> id<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>res<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> true<span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> false<span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
<span class="br0">&#125;</span><span class="sy0">;</span>
        </div>
    </div>
</div>

<p>Один из тонких моментов реализации — определение принадлежности вершины документу. Так как для ускорения поиска используется отображение <span class="codebox"><code class="text">&lt;id, элемент&gt;</code></span>, может возникнуть ситуация, когда запрашиваемый элемент уже удалён из документа (в результате в отображении остаются неактуальные данные).</p>
<p>Если удалить элемент из документа, он «лишается» родителя — в результате его родительская (псевдо)вершина имеет тип <code>QDomNode::BaseNode</code>. Это свойство и используется для определения актуальности элемента.</p>
<p>© 2012 <a href="http://blog.sjinks.pro">Ars Longa, Vita Brevis</a>. Все права защищены. Перепубликация материалов без разрешения автора запрещена.</p>
<p>При использовании материалов блога наличие активной не закрытой от индексирования ссылки на <a href="http://blog.sjinks.pro/c-cpp/qt/906-working-qdomdocument-elementbyid/">источник</a> обязательно.</p>]]></content:encoded>
			<wfw:commentRss>http://blog.sjinks.pro/c-cpp/qt/906-working-qdomdocument-elementbyid/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Генерация карты сайта в HTML из XML: часть 2</title>
		<link>http://blog.sjinks.pro/seo/849-html-stemap-generation-from-xml-part-2/</link>
		<comments>http://blog.sjinks.pro/seo/849-html-stemap-generation-from-xml-part-2/#comments</comments>
		<pubDate>Mon, 08 Nov 2010 12:04:38 +0000</pubDate>
		<dc:creator>Vladimir</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XSL]]></category>

		<guid isPermaLink="false">http://blog.sjinks.pro/?p=849</guid>
		<description><![CDATA[Как сделать не более N ссылок на файл Продолжение статьи «Генерация карты сайта в HTML из XML». В этой части мы рассмотрим, как сделать так, чтобы на одной генерируемой странице располагалось не более определённого количества ссылок. Нам понадобятся: XSL из статьи «Преобразование карты сайта в список адресов для siege»: &#60;?xml version=&#34;1.0&#34; encoding=&#34;utf-8&#34;?&#62; &#60;xsl:stylesheet version=&#34;1.0&#34; xmlns:xsl=&#34;http://www.w3.org/1999/XSL/Transform&#34; [...]<p>© 2012 <a href="http://blog.sjinks.pro">Ars Longa, Vita Brevis</a>. Все права защищены. Перепубликация материалов без разрешения автора запрещена.</p>
<p>При использовании материалов блога наличие активной не закрытой от индексирования ссылки на <a href="http://blog.sjinks.pro/seo/849-html-stemap-generation-from-xml-part-2/">источник</a> обязательно.</p>]]></description>
			<content:encoded><![CDATA[<h2><em>Как сделать не более N ссылок на файл</em></h2>
<p>Продолжение статьи <strong>«<a href="http://blog.sjinks.pro/seo/847-html-stemap-generation-from-xml/">Генерация карты сайта в HTML из XML</a>»</strong>.</p>
<p>В этой части мы рассмотрим, как сделать так, чтобы на одной генерируемой странице располагалось не более определённого количества ссылок.<span id="more-849"></span></p>
<p>Нам понадобятся:</p>
<ul>
<li><a href="http://blog.sjinks.pro/tag/xsl/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  XSL">XSL</a> из статьи <strong>«<a href="http://blog.sjinks.pro/linux/722-transform-sitemap-to-siege-url-list/">Преобразование карты сайта в список адресов для siege</a>»</strong>:
          
<div class="codebox">
    <div class="the_code" style="" id="p84914">
        <div class="code xml" id="p849code14">
<span class="sc3"><span class="re1">&lt;?xml</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">encoding</span>=<span class="st0">&quot;utf-8&quot;</span><span class="re2">?&gt;</span></span><br />
<span class="sc3"><span class="re1">&lt;xsl:stylesheet</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">xmlns:xsl</span>=<span class="st0">&quot;http://www.w3.org/1999/XSL/Transform&quot;</span> <span class="re0">xmlns:x</span>=<span class="st0">&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:output</span> <span class="re0">method</span>=<span class="st0">&quot;text&quot;</span> <span class="re0">media-type</span>=<span class="st0">&quot;text-plain&quot;</span> <span class="re0">indent</span>=<span class="st0">&quot;no&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:strip-space</span> <span class="re0">elements</span>=<span class="st0">&quot;*&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:template</span> <span class="re0">match</span>=<span class="st0">&quot;/x:urlset/x:url&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:value-of</span> <span class="re0">select</span>=<span class="st0">&quot;x:loc&quot;</span> <span class="re0">disable-output-escaping</span>=<span class="st0">&quot;yes&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:text<span class="re2">&gt;</span></span></span><span class="sc1">&amp;#x000A;</span><span class="sc3"><span class="re1">&lt;/xsl:text<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/xsl:template<span class="re2">&gt;</span></span></span><br />
<span class="sc3"><span class="re1">&lt;/xsl:stylesheet<span class="re2">&gt;</span></span></span>
        </div>
    </div>
</div>

</li>
<li>XSL из статьи <strong>«<a href="http://blog.sjinks.pro/seo/847-html-stemap-generation-from-xml/">Генерация карты сайта в HTML из XML</a>»</strong>:
          
<div class="codebox">
    <div class="the_code" style="" id="p84915">
        <div class="code xml" id="p849code15">
<span class="sc3"><span class="re1">&lt;?xml</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">encoding</span>=<span class="st0">&quot;utf-8&quot;</span><span class="re2">?&gt;</span></span><br />
<span class="sc3"><span class="re1">&lt;xsl:stylesheet</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">xmlns:xsl</span>=<span class="st0">&quot;http://www.w3.org/1999/XSL/Transform&quot;</span> <span class="re0">xmlns:x</span>=<span class="st0">&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot;</span> <span class="re0">xmlns</span>=<span class="st0">&quot;http://www.w3.org/1999/xhtml&quot;</span> <span class="re0">exclude-result-prefixes</span>=<span class="st0">&quot;x&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:output</span> <span class="re0">method</span>=<span class="st0">&quot;xml&quot;</span> <span class="re0">indent</span>=<span class="st0">&quot;yes&quot;</span> <span class="re0">encoding</span>=<span class="st0">&quot;UTF-8&quot;</span> <span class="re0">doctype-public</span>=<span class="st0">&quot;-//W3C//DTD XHTML 1.0 Transitional//EN&quot;</span> <span class="re0">doctype-system</span>=<span class="st0">&quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:strip-space</span> <span class="re0">elements</span>=<span class="st0">&quot;*&quot;</span><span class="re2">/&gt;</span></span><br />
<br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:template</span> <span class="re0">match</span>=<span class="st0">&quot;/x:urlset&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;html<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;head<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;meta</span> <span class="re0">http-equiv</span>=<span class="st0">&quot;Content-Type&quot;</span> <span class="re0">content</span>=<span class="st0">&quot;text/html; charset=utf-8&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;title<span class="re2">&gt;</span></span></span>Site Map<span class="sc3"><span class="re1">&lt;/title<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/head<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;body<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;h1<span class="re2">&gt;</span></span><span class="re1">&lt;a</span> <span class="re0">href</span>=<span class="st0">&quot;http://blog.sjinks.pro/&quot;</span><span class="re2">&gt;</span></span>Карта сайта<span class="sc3"><span class="re1">&lt;/a<span class="re2">&gt;</span></span><span class="re1">&lt;/h1<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;ul<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:apply-templates</span> <span class="re0">select</span>=<span class="st0">&quot;x:url&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/ul<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/body<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/html<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/xsl:template<span class="re2">&gt;</span></span></span><br />
<br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:template</span> <span class="re0">match</span>=<span class="st0">&quot;/x:urlset/x:url&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;li<span class="re2">&gt;</span></span><span class="re1">&lt;strong<span class="re2">&gt;</span></span><span class="re1">&lt;a</span> <span class="re0">href</span>=<span class="st0">&quot;{x:loc}&quot;</span><span class="re2">&gt;</span><span class="re1">&lt;xsl:value-of</span> <span class="re0">select</span>=<span class="st0">&quot;x:loc&quot;</span> <span class="re0">disable-output-escaping</span>=<span class="st0">&quot;yes&quot;</span><span class="re2">/&gt;</span><span class="re1">&lt;/a<span class="re2">&gt;</span></span><span class="re1">&lt;/strong<span class="re2">&gt;</span></span><span class="re1">&lt;/li<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/xsl:template<span class="re2">&gt;</span></span></span><br />
<span class="sc3"><span class="re1">&lt;/xsl:stylesheet<span class="re2">&gt;</span></span></span>
        </div>
    </div>
</div>

</li>
<li>Немного фантазии <img src='http://static.sjinks.info/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
<p>Алгоритм работы простой:</p>
<ol>
<li>Преобразовываем исходную карту в текстовый список адресов (один URL на строку)</li>
<li><a href="http://linux.die.net/man/1/split">Разбиваем</a> файл на несколько частей с заданным количеством строк</li>
<li>Преобразовываем каждый из полученных файлов в <a href="http://blog.sjinks.pro/tag/xml/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  XML">XML</a></li>
<li>Преобразовываем полученные XML-файлы в <a href="http://blog.sjinks.pro/tag/html/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  HTML">HTML</a></li>
</ol>
<p>Пункты 3 и 4 можно совместить: вместо генерации XML-файла можно сразу генерировать HTML. Но на всякий случай будем генерировать и XML, и HTML.</p>
<p>Получим такой скрипт:</p>
          
<div class="codebox">
    <div class="the_code" style="" id="p84916">
        <div class="code bash" id="p849code16">
<span class="co0">#! /bin/sh</span><br />
<br />
xsltproc <span class="re5">-o</span> urls.txt sitemap2txt.xsl sitemap.xml<br />
<span class="kw2">split</span> <span class="re5">-d</span> <span class="re5">-l</span> 100 urls.txt map<br />
<span class="kw1">for</span> i <span class="kw1">in</span> map<span class="sy0">*</span>; <span class="kw1">do</span><br />
&nbsp; &nbsp; <span class="kw2">awk</span> <span class="st_h">'<br />
&nbsp; &nbsp; BEGIN {<br />
&nbsp; &nbsp; &nbsp; &nbsp; print &quot;&lt;?xml version=\&quot;1.0\&quot;?&gt;\<br />
&lt;urlset\<br />
&nbsp; &nbsp; xmlns:xsi=\&quot;http://www.w3.org/2001/XMLSchema-instance\&quot;\<br />
&nbsp; &nbsp; xsi:schemaLocation=\&quot;http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd\&quot;\<br />
&nbsp; &nbsp; xmlns=\&quot;http://www.sitemaps.org/schemas/sitemap/0.9\&quot;&gt;&quot;<br />
&nbsp; &nbsp; }<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; print &quot;&lt;url&gt;&lt;loc&gt;&quot;$1&quot;&lt;/loc&gt;&lt;/url&gt;&quot;<br />
&nbsp; &nbsp; }<br />
&nbsp; &nbsp; END {<br />
&nbsp; &nbsp; &nbsp; &nbsp; print &quot;&lt;/urlset&gt;&quot;<br />
&nbsp; &nbsp; }<br />
&nbsp; &nbsp; '</span> <span class="re1">$i</span> <span class="sy0">&gt;</span> site<span class="re1">$i</span>.xml<br />
&nbsp; &nbsp; xsltproc <span class="re5">-o</span> site<span class="re1">$i</span>.html sitemap2html.xsl site<span class="re1">$i</span>.xml<br />
<span class="kw1">done</span>
        </div>
    </div>
</div>

<p>Всё просто!™</p>
<p>© 2012 <a href="http://blog.sjinks.pro">Ars Longa, Vita Brevis</a>. Все права защищены. Перепубликация материалов без разрешения автора запрещена.</p>
<p>При использовании материалов блога наличие активной не закрытой от индексирования ссылки на <a href="http://blog.sjinks.pro/seo/849-html-stemap-generation-from-xml-part-2/">источник</a> обязательно.</p>]]></content:encoded>
			<wfw:commentRss>http://blog.sjinks.pro/seo/849-html-stemap-generation-from-xml-part-2/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Генерация карты сайта в HTML из XML</title>
		<link>http://blog.sjinks.pro/seo/847-html-stemap-generation-from-xml/</link>
		<comments>http://blog.sjinks.pro/seo/847-html-stemap-generation-from-xml/#comments</comments>
		<pubDate>Sun, 07 Nov 2010 01:22:20 +0000</pubDate>
		<dc:creator>Wandering Soul</dc:creator>
				<category><![CDATA[SEO]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XSL]]></category>

		<guid isPermaLink="false">http://blog.sjinks.pro/?p=847</guid>
		<description><![CDATA[Помогаем ботам найти все страница сайта Проблема: есть достаточно большой сайт, у которого есть карта в формате XML. Есть бот, который этот сайт индексирует. Но бот не может найти страницы, имеющие уровень вложенности больше трёх. Нужно помочь боту проиндексировать весь сайт. Самый простой способ — создание страницы, в которой будут перечислены все страницы сайта и размещение на [...]<p>© 2012 <a href="http://blog.sjinks.pro">Ars Longa, Vita Brevis</a>. Все права защищены. Перепубликация материалов без разрешения автора запрещена.</p>
<p>При использовании материалов блога наличие активной не закрытой от индексирования ссылки на <a href="http://blog.sjinks.pro/seo/847-html-stemap-generation-from-xml/">источник</a> обязательно.</p>]]></description>
			<content:encoded><![CDATA[<h2><em>Помогаем ботам найти все страница сайта</em></h2>
<p><strong>Проблема:</strong> есть достаточно большой сайт, у которого есть <a href="/sitemap.xml">карта в формате XML</a>. Есть бот, который этот сайт индексирует. Но бот не может найти страницы, имеющие уровень вложенности больше трёх. Нужно помочь боту проиндексировать весь сайт.</p>
<p>Самый простой способ — создание страницы, в которой будут перечислены все страницы сайта и размещение на неё ссылки из подвала сайта. Идеальный кандидат на такую страницу — карта сайта в формате <a href="http://blog.sjinks.pro/tag/xml/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  XML">XML</a>. Проблема в том, что не все боты утруждают себя разбором <a href="http://blog.sjinks.pro/tag/xml/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  XML">XML</a>-карт. Для таких ботов карту нужно преобразовывать в формат <a href="http://blog.sjinks.pro/tag/html/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  HTML">HTML</a>.<span id="more-847"></span></p>
<p>Как и в <a href="http://blog.sjinks.pro/linux/722-transform-sitemap-to-siege-url-list/">другом похожем случае</a>, на помощь приходит преобразование <a href="http://blog.sjinks.pro/tag/xsl/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  XSL">XSL</a>.</p>
<p>Пример шаблона стилей XML:</p>
          
<div class="codebox">
    <div class="the_code" style="" id="p84719">
        <div class="code xml" id="p847code19">
<span class="sc3"><span class="re1">&lt;?xml</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">encoding</span>=<span class="st0">&quot;utf-8&quot;</span><span class="re2">?&gt;</span></span><br />
<span class="sc3"><span class="re1">&lt;xsl:stylesheet</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">xmlns:xsl</span>=<span class="st0">&quot;http://www.w3.org/1999/XSL/Transform&quot;</span> <span class="re0">xmlns:x</span>=<span class="st0">&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot;</span> <span class="re0">xmlns</span>=<span class="st0">&quot;http://www.w3.org/1999/xhtml&quot;</span> <span class="re0">exclude-result-prefixes</span>=<span class="st0">&quot;x&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:output</span> <span class="re0">method</span>=<span class="st0">&quot;xml&quot;</span> <span class="re0">indent</span>=<span class="st0">&quot;yes&quot;</span> <span class="re0">encoding</span>=<span class="st0">&quot;UTF-8&quot;</span> <span class="re0">doctype-public</span>=<span class="st0">&quot;-//W3C//DTD XHTML 1.0 Transitional//EN&quot;</span> <span class="re0">doctype-system</span>=<span class="st0">&quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:strip-space</span> <span class="re0">elements</span>=<span class="st0">&quot;*&quot;</span><span class="re2">/&gt;</span></span><br />
<br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:template</span> <span class="re0">match</span>=<span class="st0">&quot;/x:urlset&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;html<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;head<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;meta</span> <span class="re0">http-equiv</span>=<span class="st0">&quot;Content-Type&quot;</span> <span class="re0">content</span>=<span class="st0">&quot;text/html; charset=utf-8&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;title<span class="re2">&gt;</span></span></span>Site Map<span class="sc3"><span class="re1">&lt;/title<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/head<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;body<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;h1<span class="re2">&gt;</span></span><span class="re1">&lt;a</span> <span class="re0">href</span>=<span class="st0">&quot;http://blog.sjinks.pro/&quot;</span><span class="re2">&gt;</span></span>Карта сайта<span class="sc3"><span class="re1">&lt;/a<span class="re2">&gt;</span></span><span class="re1">&lt;/h1<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;ul<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:apply-templates</span> <span class="re0">select</span>=<span class="st0">&quot;x:url&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:sort</span> <span class="re0">select</span>=<span class="st0">&quot;lastmod&quot;</span> <span class="re0">order</span>=<span class="st0">&quot;descending&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/xsl:apply-templates<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/ul<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/body<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/html<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/xsl:template<span class="re2">&gt;</span></span></span><br />
<br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:template</span> <span class="re0">match</span>=<span class="st0">&quot;/x:urlset/x:url&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;li<span class="re2">&gt;</span></span><span class="re1">&lt;strong<span class="re2">&gt;</span></span><span class="re1">&lt;a</span> <span class="re0">href</span>=<span class="st0">&quot;{x:loc}&quot;</span><span class="re2">&gt;</span><span class="re1">&lt;xsl:value-of</span> <span class="re0">select</span>=<span class="st0">&quot;x:loc&quot;</span> <span class="re0">disable-output-escaping</span>=<span class="st0">&quot;yes&quot;</span><span class="re2">/&gt;</span><span class="re1">&lt;/a<span class="re2">&gt;</span></span><span class="re1">&lt;/strong<span class="re2">&gt;</span></span><span class="re1">&lt;xsl:if</span> <span class="re0">test</span>=<span class="st0">&quot;x:lastmod&quot;</span><span class="re2">&gt;</span></span> (<span class="sc3"><span class="re1">&lt;xsl:value-of</span> <span class="re0">select</span>=<span class="st0">&quot;x:lastmod&quot;</span> <span class="re0">disable-output-escaping</span>=<span class="st0">&quot;yes&quot;</span><span class="re2">/&gt;</span></span>)<span class="sc3"><span class="re1">&lt;/xsl:if<span class="re2">&gt;</span></span><span class="re1">&lt;/li<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/xsl:template<span class="re2">&gt;</span></span></span><br />
<span class="sc3"><span class="re1">&lt;/xsl:stylesheet<span class="re2">&gt;</span></span></span>
        </div>
    </div>
</div>

<p>Преобразование выполняется так:</p>
          
<div class="codebox">
    <div class="the_code" style="" id="p84720">
        <div class="code bash" id="p847code20">
xsltproc <span class="re5">-o</span> <span class="sy0">/</span>path<span class="sy0">/</span>to<span class="sy0">/</span>sitemap.html <span class="sy0">/</span>path<span class="sy0">/</span>to<span class="sy0">/</span>sitemap2html.xml <span class="sy0">/</span>path<span class="sy0">/</span>to<span class="sy0">/</span>sitemap.xml
        </div>
    </div>
</div>

<p>Можно выполнение <code>xsltproc</code> повесить на крон и наслаждаться результатом.</p>
<p>© 2012 <a href="http://blog.sjinks.pro">Ars Longa, Vita Brevis</a>. Все права защищены. Перепубликация материалов без разрешения автора запрещена.</p>
<p>При использовании материалов блога наличие активной не закрытой от индексирования ссылки на <a href="http://blog.sjinks.pro/seo/847-html-stemap-generation-from-xml/">источник</a> обязательно.</p>]]></content:encoded>
			<wfw:commentRss>http://blog.sjinks.pro/seo/847-html-stemap-generation-from-xml/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Преобразование карты сайта в список адресов для siege</title>
		<link>http://blog.sjinks.pro/linux/722-transform-sitemap-to-siege-url-list/</link>
		<comments>http://blog.sjinks.pro/linux/722-transform-sitemap-to-siege-url-list/#comments</comments>
		<pubDate>Fri, 11 Dec 2009 07:32:18 +0000</pubDate>
		<dc:creator>Vladimir</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XSL]]></category>

		<guid isPermaLink="false">http://blog.sjinks.pro/?p=722</guid>
		<description><![CDATA[Использование XSL-преобразований для перевода XML в текст Siege — утилита для нагрузочного тестирования web-серверов, целью которой является дать разработчикам возможность проверить быстродействие/ресурсоёмкость кода в условиях, максимально приближенных к реальным. В режимах регрессионного тестирования и «имитации Internet» siege использует текстовый файл со списком адресов для тестирования. В качестве такого файла очень удобно было бы использовать карту сайта (sitemap), [...]<p>© 2012 <a href="http://blog.sjinks.pro">Ars Longa, Vita Brevis</a>. Все права защищены. Перепубликация материалов без разрешения автора запрещена.</p>
<p>При использовании материалов блога наличие активной не закрытой от индексирования ссылки на <a href="http://blog.sjinks.pro/linux/722-transform-sitemap-to-siege-url-list/">источник</a> обязательно.</p>]]></description>
			<content:encoded><![CDATA[<h2><em>Использование <a href="http://blog.sjinks.pro/tag/xsl/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  XSL">XSL</a>-преобразований для перевода <a href="http://blog.sjinks.pro/tag/xml/" class="st_tag internal_tag" rel="tag" title="Записи, помеченные с  XML">XML</a> в текст</em></h2>
<p><a href="http://joedog.org/index/siege-home">Siege</a> — утилита для нагрузочного тестирования web-серверов, целью которой является дать разработчикам возможность проверить быстродействие/ресурсоёмкость кода в условиях, максимально приближенных к реальным.</p>
<p>В режимах регрессионного тестирования и «имитации Internet» <code>siege</code> использует текстовый файл со списком адресов для тестирования.</p>
<p>В качестве такого файла очень удобно было бы использовать карту сайта (sitemap), но, к сожалению, <code>siege</code> не понимает XML. В данной статье рассмотрено одно из возможных решений по преобразованию карты сайта из XML в текстовый формат.<span id="more-722"></span></p>
<p>Как уже было отмечено, карта сайта представляет собой XML-файл, формат которого подробно описан <a href="http://www.sitemaps.org/protocol.php">здесь</a>. Задача заключается в преобразовании XML-файла в текстовый формат (каждый адрес в отдельной строке и без лишних пробелов).</p>
<p>Самый надёжный способ — использование «родных» для XML XSL-преобразований. В отличие от регулярных выражений XSL-преобразования будут правильно работать даже в том случае, если используется расширенный формат карты (с элементами из других пространств имён).</p>
<p>Для преобразования понадобится данный XSL-файл:</p>
          
<div class="codebox">
    <div class="the_code" style="" id="p72225">
        <div class="code xml" id="p722code25">
<span class="sc3"><span class="re1">&lt;?xml</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">encoding</span>=<span class="st0">&quot;utf-8&quot;</span><span class="re2">?&gt;</span></span><br />
<span class="sc3"><span class="re1">&lt;xsl:stylesheet</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">xmlns:xsl</span>=<span class="st0">&quot;http://www.w3.org/1999/XSL/Transform&quot;</span> <span class="re0">xmlns:x</span>=<span class="st0">&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:output</span> <span class="re0">method</span>=<span class="st0">&quot;text&quot;</span> <span class="re0">media-type</span>=<span class="st0">&quot;text-plain&quot;</span> <span class="re0">indent</span>=<span class="st0">&quot;no&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:strip-space</span> <span class="re0">elements</span>=<span class="st0">&quot;*&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:template</span> <span class="re0">match</span>=<span class="st0">&quot;/x:urlset/x:url&quot;</span><span class="re2">&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:value-of</span> <span class="re0">select</span>=<span class="st0">&quot;x:loc&quot;</span> <span class="re0">disable-output-escaping</span>=<span class="st0">&quot;yes&quot;</span><span class="re2">/&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;xsl:text<span class="re2">&gt;</span></span></span><span class="sc1">&amp;#x000A;</span><span class="sc3"><span class="re1">&lt;/xsl:text<span class="re2">&gt;</span></span></span><br />
&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/xsl:template<span class="re2">&gt;</span></span></span><br />
<span class="sc3"><span class="re1">&lt;/xsl:stylesheet<span class="re2">&gt;</span></span></span>
        </div>
    </div>
</div>

<p>Элемент <span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;xsl:output<span class="re2">&gt;</span></span></span></code></span> задаёт формат результата (обычный текст), <span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;xsl:strip-space</span> <span class="re0">elements</span>=<span class="st0">&quot;*&quot;</span><span class="re2">/&gt;</span></span></code></span> указывает, что у всех элементов нужно удалять лишние пробельные символы, а секция <span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;xsl:template</span><span class="re2">/&gt;</span></span></code></span> делает выборку по всем блокам <span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;url<span class="re2">&gt;</span></span></span></code></span> и собирает значения элементов <span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;loc<span class="re2">&gt;</span></span></span></code></span>, добавляя новую строку (<span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;xsl:text<span class="re2">&gt;</span></span></span><span class="sc1">&amp;#x000A;</span><span class="sc3"><span class="re1">&lt;/xsl:text<span class="re2">&gt;</span></span></span></code></span>) после каждого адреса.</p>
<p>Загружаем файл и натравливаем <span class="codebox"><code class="bash">xsltproc</code></span> на сайтмап (например, sitemap.xml):</p>
          
<div class="codebox">
    <div class="the_code" style="" id="p72226">
        <div class="code bash" id="p722code26">
xsltproc sitemap2txt.xml sitemap.xml <span class="re5">-o</span> urls.txt
        </div>
    </div>
</div>

<p>Вуаля, список готов. Важно, чтобы карта сайта <em>не содержала</em> директив <span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;?xml-stylesheet</span> <span class="re2">?&gt;</span></span></code></span>.</p>
<p>Помимо XSL-преобразований есть еще способ с использованием потокового редактора <span class="codebox"><code class="bash"><span class="kw2">sed</span></code></span>, но способ достаточно хрупкий. Представляет собой типичный пример использования чёрной магии.</p>
          
<div class="codebox">
    <div class="the_code" style="" id="p72227">
        <div class="code bash" id="p722code27">
<span class="co0">#! /bin/sh</span><br />
<br />
<span class="kw2">sed</span> <span class="re5">-r</span> <span class="st_h">'s/&lt;loc/\n&lt;loc/g; s!&lt;/loc&gt;!&lt;/loc&gt;\n!g'</span> $1 <span class="sy0">|</span> <span class="kw2">sed</span> <span class="re5">-r</span> <span class="re5">-n</span> <span class="st_h">'/&lt;loc&gt;.*?&lt;\/loc&gt;/! D; /&lt;loc&gt;.*?&lt;\/loc&gt;/ s!&lt;/?loc&gt;!!g; s!\s+!!g; P'</span>
        </div>
    </div>
</div>

<p>Работает эта магия следующим образом: перед всеми открывающими тэгами и после всех закрывающих тэгов <span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;/loc<span class="re2">&gt;</span></span></span></code></span> добавляется символ новой строки — это делается для того, чтобы все адреса гарантированно располагались на разных строках. Результат отдаётся второму <span class="codebox"><code class="bash"><span class="kw2">sed</span></code></span>, работающему в «бесшумном» режиме. Удаляется всё, что <em>не</em> находится внутри тэгов <span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;loc<span class="re2">&gt;</span></span></span>…<span class="sc3"><span class="re1">&lt;/loc<span class="re2">&gt;</span></span></span></code></span>, затем удаляются сами тэги <span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;loc<span class="re2">&gt;</span></span></span></code></span> и, наконец, удаляются лишние пробелы и выводится результат.</p>
<p>Работает так:</p>
          
<div class="codebox">
    <div class="the_code" style="" id="p72228">
        <div class="code bash" id="p722code28">
sitemap2list.sh sitemap.xml <span class="sy0">&gt;</span> list.txt
        </div>
    </div>
</div>

<p>Не будет работать, если пространство имён по умолчанию не <span class="codebox"><code class="text">http://www.sitemaps.org/schemas/sitemap/0.9</code></span>, а также если блок <span class="codebox"><code class="xml"><span class="sc3"><span class="re1">&lt;loc<span class="re2">&gt;</span></span></span>…<span class="sc3"><span class="re1">&lt;/loc<span class="re2">&gt;</span></span></span></code></span> располагается на нескольких строках. XSLT всё-таки лучше.</p>
<p>© 2012 <a href="http://blog.sjinks.pro">Ars Longa, Vita Brevis</a>. Все права защищены. Перепубликация материалов без разрешения автора запрещена.</p>
<p>При использовании материалов блога наличие активной не закрытой от индексирования ссылки на <a href="http://blog.sjinks.pro/linux/722-transform-sitemap-to-siege-url-list/">источник</a> обязательно.</p>]]></content:encoded>
			<wfw:commentRss>http://blog.sjinks.pro/linux/722-transform-sitemap-to-siege-url-list/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

