I Need To Parse This String - Topic

Member

Posts: 14,631

Joined: Sep 14 2006

Gold: 575.56

Jul 6 2016 03:18pm

the old stackoverflow isn't really helping me much
so i got this xslt type string right?
unfortunately it's not a document and this isnt something i'm too familiar with if anyone has a reference that's worth a fuck (w3schools does not fit this bill)

idk i set up a stupid thing today but i used a standard text reader because i couldnt figure out how to get the parse to work right
i need this like the string in quotes "becker..."

Code

<t:Span Text="Becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker" />

i know not working with xml or json is a shortcoming of mine and i always seem to find ways around it but i would really like to know how to do it right

Code

<t:RadDocument xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:t="clr-namespace:Telerik.Windows.Documents.Model;assembly=Telerik.Windows.Documents" xmlns:s="clr-namespace:Telerik.Windows.Documents.Model.Styles;assembly=Telerik.Windows.Documents" xmlns:r="clr-namespace:Telerik.Windows.Documents.Model.Revisions;assembly=Telerik.Windows.Documents" xmlns:n="clr-namespace:Telerik.Windows.Documents.Model.Notes;assembly=Telerik.Windows.Documents" xmlns:th="clr-namespace:Telerik.Windows.Documents.Model.Themes;assembly=Telerik.Windows.Documents" version="1.4" LayoutMode="Paged" LineSpacing="1.15" LineSpacingType="Auto" ParagraphDefaultSpacingAfter="12" ParagraphDefaultSpacingBefore="0" SelectedBibliographicStyleName="\APA.XSL" StyleName="defaultDocumentStyle">
<t:RadDocument.Captions>
<t:CaptionDefinition IsDefault="True" IsLinkedToHeading="False" Label="Figure" LinkedHeadingLevel="0" NumberingFormat="Arabic" SeparatorType="Hyphen" />
<t:CaptionDefinition IsDefault="True" IsLinkedToHeading="False" Label="Table" LinkedHeadingLevel="0" NumberingFormat="Arabic" SeparatorType="Hyphen" />
</t:RadDocument.Captions>
<t:RadDocument.ProtectionSettings>
<t:DocumentProtectionSettings EnableDocumentProtection="False" Enforce="False" HashingAlgorithm="None" HashingSpinCount="0" ProtectionMode="ReadOnly" />
</t:RadDocument.ProtectionSettings>
<t:RadDocument.Styles>
<s:StyleDefinition DisplayName="defaultDocumentStyle" IsCustom="False" IsDefault="False" IsPrimary="True" Name="defaultDocumentStyle" Type="Default">
<s:StyleDefinition.ParagraphStyle>
<s:ParagraphProperties LineSpacing="1.15" SpacingAfter="12" />
</s:StyleDefinition.ParagraphStyle>
<s:StyleDefinition.SpanStyle>
<s:SpanProperties FontFamily="Verdana" FontSize="16" FontStyle="Normal" FontWeight="Normal" />
</s:StyleDefinition.SpanStyle>
</s:StyleDefinition>
<s:StyleDefinition DisplayName="Normal" IsCustom="False" IsDefault="True" IsPrimary="True" Name="Normal" Type="Paragraph" UIPriority="0" />
<s:StyleDefinition DisplayName="Table Normal" IsCustom="False" IsDefault="True" IsPrimary="False" Name="TableNormal" Type="Table" UIPriority="59">
<s:StyleDefinition.TableStyle>
<s:TableProperties CellPadding="5,0,5,0">
<s:TableProperties.TableLook>
<t:TableLook />
</s:TableProperties.TableLook>
</s:TableProperties>
</s:StyleDefinition.TableStyle>
</s:StyleDefinition>
</t:RadDocument.Styles>
<t:Section>
<t:Paragraph>
<t:Paragraph.ParagraphSymbolPropertiesStyle>
<s:SpanProperties FlowDirection="LeftToRight" />
</t:Paragraph.ParagraphSymbolPropertiesStyle>
<t:Span Text="Becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker" />
</t:Paragraph>
</t:Section>
</t:RadDocument>

AbDuCt

Member

Posts: 13,425

Joined: Sep 29 2007

Gold: 0.00

Warn: 20%

Jul 6 2016 05:47pm

Are you looking to extract that specific string, or actual parsing of the entire document?

If the later parsing the document into nested hashes would probably be the starting point. I would strip the data of everything besides the tag names and start by parsing that into nested hashes. Once that is done you can start parsing each line from the hash into its values such as Text and Paragraph.

Perhaps make a class for each tag with setter/getters for each value that can be stored. For instance StyleDefinition can have a setter/getter for DisplayName, IsCustom, IsDefault, etc and Span can have the value Text.

The hardest part would likely be creating the nested hash since this requires some form of way to find the correct closing tag for the opening tag.

------------------------------------------------------------------

Anyways I don't have much else to say since you haven't told us much about what you want to do besides extracting a string (ex, extracting just the string simply, writing your own parser, looking for an existing parser, etc).

XmlSerializer.Deserialize method might be another way to extract this since it does look fairly XML like. I believe I saw it as a answer on SO a long time ago.

This post was edited by AbDuCt on Jul 6 2016 05:48pm

Ideophobe

Member

Posts: 14,631

Joined: Sep 14 2006

Gold: 575.56

Jul 7 2016 07:26am

it's all 1 big string
i just want the string in the "t:Span Text=" tag
ya, i just want that 1 string everything else is style info

right now i'm pretty much reading until i run into

Code

<t:Span Text="

starting the read after it and going until

Code

" />

but i had seen you do some cool xpath work in the past and thought there might be a better way than a character scan to get this one string. parsing out the entire thing might be a bit of overkill though

i tried to do something like this example from msdn but i couldnt get it to work out

Code

// Load the document and set the root element.
XmlDocument doc = new XmlDocument();
doc.Load("bookstore.xml");
XmlNode root = doc.DocumentElement;

// Add the namespace.
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("bk", "urn:newbooks-schema");

// Select and display the first node in which the author's
// last name is Kingsolver.
XmlNode node = root.SelectSingleNode(
"descendant::bk:book[bk:author/bk:last-name='Kingsolver']", nsmgr);
Console.WriteLine(node.InnerXml);

but i couldnt figure out how to get the XmlNode to work

Code

XPathException: 'descendant::t:Span Text' has an invalid token.

Code

XmlNode node = root.SelectSingleNode(
"descendant::t:Span Text",nsmgr);

russian

Member

Posts: 7,324

Joined: Dec 22 2002

Gold: 1,261.00

Jul 7 2016 04:43pm

Code

XmlDocument xml = new XmlDocument();
xml.Load("bookstore.xml");
XmlNode root = xml.DocumentElement;

XmlNamespaceManager nsmgr = new XmlNamespaceManager(xml.NameTable);
nsmgr.AddNamespace("t", "clr-namespace:Telerik.Windows.Documents.Model;assembly=Telerik.Windows.Documents");

var span = root.SelectSingleNode("//t:Span", nsmgr);
Console.Out.WriteLine(span.Attributes.GetNamedItem("Text").InnerText);

Try that

AbDuCt

Member

Posts: 13,425

Joined: Sep 29 2007

Gold: 0.00

Warn: 20%

Jul 7 2016 06:12pm

I am not to sure about xpaths/xml in C# sorry, but if you absolutely give up you can just use regex:

Code

<t:Span Text=\"(.*)\" \/>

Although this would match any node that is called t:Span Text, not just your node. AKA if you have any duplicates it will also retrieve those.

Ideophobe

Member

Posts: 14,631

Joined: Sep 14 2006

Gold: 575.56

Jul 8 2016 02:07am

Quote (russian @ Jul 7 2016 04:43pm)

Code

Try that

thanks alot your syntax makes alot more sense than the msdn examples and the nastiness of characters i tried to mash together
i'll give it a shot in the morning and let ya know how it goes

Quote (AbDuCt @ Jul 7 2016 06:12pm)

I am not to sure about xpaths/xml in C# sorry, but if you absolutely give up you can just use regex:

Code

<t:Span Text=\"(.*)\" \/>

Although this would match any node that is called t:Span Text, not just your node. AKA if you have any duplicates it will also retrieve those.

ya, that's pretty much what i did as a temporary solution just to get the ticket finished, i wrote them as two seperate splits cuz i was too lazy to figure out what the index would be if i split the whole stringon _____" />_____, but ya same idea
it should work fine the string i'm parsing from is a message sent across an app stored in the database and the span text is the contents of the message(what i want)

Go Back To Programming & Development Topic List