php - Strip all elements inside a specific element when descendant elements have same name as ancestor -


i'm using php , strip out tags inside specific tag , keep plain text. issue i'm stuck on there child tags have same name of parents tags:

<corpo>     <num>1.</num>     <mod id="mod167">         string 1         <commas id="mod167-vir1" type="word">string 2</commas>         <com id="mod166-vir1-20090024-art13-com16.1"><num>&lt;&lt;16.</num></com>         <rif xlink:href="urn" xlink:type="simple">string 3</rif><h:p>something here</h:p>         <corpo>string 4</corpo>    </mod> </corpo> 

here, example, corpo have child tag same name (<corpo>string 4</corpo>) , num tag used 2 times (<num>1.</num> , <num>&lt;&lt;16.</num>) inside parent tag corpo.

starting highest corpo tag strip out every child tag , keep plain text. result should be:

<corpo>     string 1 string 2 &lt;&lt;16. string 3 here string 4 </corpo> 

up tried simplexml , php strip_tags adding tags want keep, of course not give result expect.

$result = strip_tags($xml, "<corpo></corpo>"; 

this pretty related @thw wrote, more focussed on simplexml. show different angle on xpath select corpo element(s).

given document same or more ancestors in question string $buffer here example xml:

$xml = simplexml_load_string($buffer);  foreach ($xml->xpath('//corpo[not(ancestor::corpo)]') $corpo) {     $corpo[0] = dom_import_simplexml($corpo)->textcontent; }  $xml->asxml('php://output'); 

an exemplary output of is:

<a xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:h="ns:h">     <b>         <corpo>             1.                  string 1                 string 2                      &lt;&lt;16.                  string 3                 here                 string 4          </corpo>     </b> </a> 

it works following:

get each corpo element has no ancestor name. done xpath:

//corpo[not(ancestor::corpo)] 

then simplexmlelement , want text-content, accesible through that's $corpo associated domelement node:

dom_import_simplexml($corpo)->textcontent; 

the remaining expression

$corpo[0] = ... 

just tells update content of simplexmlelement (so called self-reference).

btw have used strip_tags($corpo->asxml()) here instead of dom_import_simplexml($corpo)->textcontent won't suggest because don't know how stable strip_tags is. it's @ least not xml standard conform.

now might want apply whitespace normalization well, preg_replace handy utf-8 flag string encoding used simplexmlelement , domelement:

foreach ($xml->xpath('//corpo[not(ancestor::corpo)]') $corpo) {     $text     = dom_import_simplexml($corpo)->textcontent;     $corpo[0] = preg_replace('~\s+~u', ' ', $text); } 

this variant gives you:

<?xml version="1.0"?> <a xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:h="ns:h">     <b>         <corpo> 1. string 1 string 2 &lt;&lt;16. string 3 here string 4 </corpo>     </b> </a> 

the full example @ glance demo:

<?php  $buffer = <<<xml <a xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:h="ns:h">     <b>         <corpo>             <num>1.</num>             <mod id="mod167">                 string 1                 <commas id="mod167-vir1" type="word">string 2</commas>                 <com id="mod166-vir1-20090024-art13-com16.1">                     <num>&lt;&lt;16.</num>                 </com>                 <rif xlink:href="urn" xlink:type="simple">string 3</rif>                 <h:p>something here</h:p>                 <corpo>string 4</corpo>             </mod>         </corpo>     </b> </a> xml;   $xml = simplexml_load_string($buffer);  foreach ($xml->xpath('//corpo[not(ancestor::corpo)]') $corpo) {     $text     = dom_import_simplexml($corpo)->textcontent;     $corpo[0] = preg_replace('~\s+~u', ' ', $text); }  $xml->asxml('php://output'); 

Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

How to provide Authorization & Authentication using Asp.net, C#? -