d2jsp
Log InRegister
d2jsp Forums > Off-Topic > Computers & IT > Programming & Development > I Need To Parse This String > Cant Figure Out C#
Add Reply New Topic New Poll
Member
Posts: 14,631
Joined: Sep 14 2006
Gold: 575.56
Jul 6 2016 03:18pm
the old stackoverflow isn't really helping me much
so i got this xslt type string right?
unfortunately it's not a document and this isnt something i'm too familiar with if anyone has a reference that's worth a fuck (w3schools does not fit this bill)

idk i set up a stupid thing today but i used a standard text reader because i couldnt figure out how to get the parse to work right
i need this like the string in quotes "becker..."
Code
<t:Span Text="Becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker" />


i know not working with xml or json is a shortcoming of mine and i always seem to find ways around it but i would really like to know how to do it right

Code

<t:RadDocument xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:t="clr-namespace:Telerik.Windows.Documents.Model;assembly=Telerik.Windows.Documents" xmlns:s="clr-namespace:Telerik.Windows.Documents.Model.Styles;assembly=Telerik.Windows.Documents" xmlns:r="clr-namespace:Telerik.Windows.Documents.Model.Revisions;assembly=Telerik.Windows.Documents" xmlns:n="clr-namespace:Telerik.Windows.Documents.Model.Notes;assembly=Telerik.Windows.Documents" xmlns:th="clr-namespace:Telerik.Windows.Documents.Model.Themes;assembly=Telerik.Windows.Documents" version="1.4" LayoutMode="Paged" LineSpacing="1.15" LineSpacingType="Auto" ParagraphDefaultSpacingAfter="12" ParagraphDefaultSpacingBefore="0" SelectedBibliographicStyleName="\APA.XSL" StyleName="defaultDocumentStyle">
<t:RadDocument.Captions>
<t:CaptionDefinition IsDefault="True" IsLinkedToHeading="False" Label="Figure" LinkedHeadingLevel="0" NumberingFormat="Arabic" SeparatorType="Hyphen" />
<t:CaptionDefinition IsDefault="True" IsLinkedToHeading="False" Label="Table" LinkedHeadingLevel="0" NumberingFormat="Arabic" SeparatorType="Hyphen" />
</t:RadDocument.Captions>
<t:RadDocument.ProtectionSettings>
<t:DocumentProtectionSettings EnableDocumentProtection="False" Enforce="False" HashingAlgorithm="None" HashingSpinCount="0" ProtectionMode="ReadOnly" />
</t:RadDocument.ProtectionSettings>
<t:RadDocument.Styles>
<s:StyleDefinition DisplayName="defaultDocumentStyle" IsCustom="False" IsDefault="False" IsPrimary="True" Name="defaultDocumentStyle" Type="Default">
<s:StyleDefinition.ParagraphStyle>
<s:ParagraphProperties LineSpacing="1.15" SpacingAfter="12" />
</s:StyleDefinition.ParagraphStyle>
<s:StyleDefinition.SpanStyle>
<s:SpanProperties FontFamily="Verdana" FontSize="16" FontStyle="Normal" FontWeight="Normal" />
</s:StyleDefinition.SpanStyle>
</s:StyleDefinition>
<s:StyleDefinition DisplayName="Normal" IsCustom="False" IsDefault="True" IsPrimary="True" Name="Normal" Type="Paragraph" UIPriority="0" />
<s:StyleDefinition DisplayName="Table Normal" IsCustom="False" IsDefault="True" IsPrimary="False" Name="TableNormal" Type="Table" UIPriority="59">
<s:StyleDefinition.TableStyle>
<s:TableProperties CellPadding="5,0,5,0">
<s:TableProperties.TableLook>
<t:TableLook />
</s:TableProperties.TableLook>
</s:TableProperties>
</s:StyleDefinition.TableStyle>
</s:StyleDefinition>
</t:RadDocument.Styles>
<t:Section>
<t:Paragraph>
<t:Paragraph.ParagraphSymbolPropertiesStyle>
<s:SpanProperties FlowDirection="LeftToRight" />
</t:Paragraph.ParagraphSymbolPropertiesStyle>
<t:Span Text="Becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker becker" />
</t:Paragraph>
</t:Section>
</t:RadDocument>
Member
Posts: 13,425
Joined: Sep 29 2007
Gold: 0.00
Warn: 20%
Jul 6 2016 05:47pm
Are you looking to extract that specific string, or actual parsing of the entire document?

If the later parsing the document into nested hashes would probably be the starting point. I would strip the data of everything besides the tag names and start by parsing that into nested hashes. Once that is done you can start parsing each line from the hash into its values such as Text and Paragraph.

Perhaps make a class for each tag with setter/getters for each value that can be stored. For instance StyleDefinition can have a setter/getter for DisplayName, IsCustom, IsDefault, etc and Span can have the value Text.

The hardest part would likely be creating the nested hash since this requires some form of way to find the correct closing tag for the opening tag.

------------------------------------------------------------------

Anyways I don't have much else to say since you haven't told us much about what you want to do besides extracting a string (ex, extracting just the string simply, writing your own parser, looking for an existing parser, etc).

XmlSerializer.Deserialize method might be another way to extract this since it does look fairly XML like. I believe I saw it as a answer on SO a long time ago.

This post was edited by AbDuCt on Jul 6 2016 05:48pm
Member
Posts: 14,631
Joined: Sep 14 2006
Gold: 575.56
Jul 7 2016 07:26am
it's all 1 big string
i just want the string in the "t:Span Text=" tag
ya, i just want that 1 string everything else is style info

right now i'm pretty much reading until i run into
Code
<t:Span Text="

starting the read after it and going until
Code
" />


but i had seen you do some cool xpath work in the past and thought there might be a better way than a character scan to get this one string. parsing out the entire thing might be a bit of overkill though

i tried to do something like this example from msdn but i couldnt get it to work out

Code
// Load the document and set the root element.
XmlDocument doc = new XmlDocument();
doc.Load("bookstore.xml");
XmlNode root = doc.DocumentElement;

// Add the namespace.
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("bk", "urn:newbooks-schema");

// Select and display the first node in which the author's
// last name is Kingsolver.
XmlNode node = root.SelectSingleNode(
"descendant::bk:book[bk:author/bk:last-name='Kingsolver']", nsmgr);
Console.WriteLine(node.InnerXml);


but i couldnt figure out how to get the XmlNode to work

Code
XPathException: 'descendant::t:Span Text' has an invalid token.


Code
XmlNode node = root.SelectSingleNode(
"descendant::t:Span Text",nsmgr);
Member
Posts: 7,324
Joined: Dec 22 2002
Gold: 1,261.00
Jul 7 2016 04:43pm
Code
XmlDocument xml = new XmlDocument();
xml.Load("bookstore.xml");
XmlNode root = xml.DocumentElement;

XmlNamespaceManager nsmgr = new XmlNamespaceManager(xml.NameTable);
nsmgr.AddNamespace("t", "clr-namespace:Telerik.Windows.Documents.Model;assembly=Telerik.Windows.Documents");

var span = root.SelectSingleNode("//t:Span", nsmgr);
Console.Out.WriteLine(span.Attributes.GetNamedItem("Text").InnerText);



Try that
Member
Posts: 13,425
Joined: Sep 29 2007
Gold: 0.00
Warn: 20%
Jul 7 2016 06:12pm
I am not to sure about xpaths/xml in C# sorry, but if you absolutely give up you can just use regex:

Code
<t:Span Text=\"(.*)\" \/>


Although this would match any node that is called t:Span Text, not just your node. AKA if you have any duplicates it will also retrieve those.
Member
Posts: 14,631
Joined: Sep 14 2006
Gold: 575.56
Jul 8 2016 02:07am
Quote (russian @ Jul 7 2016 04:43pm)
Code
XmlDocument xml = new XmlDocument();
xml.Load("bookstore.xml");
XmlNode root = xml.DocumentElement;

XmlNamespaceManager nsmgr = new XmlNamespaceManager(xml.NameTable);
nsmgr.AddNamespace("t", "clr-namespace:Telerik.Windows.Documents.Model;assembly=Telerik.Windows.Documents");

var span = root.SelectSingleNode("//t:Span", nsmgr);
Console.Out.WriteLine(span.Attributes.GetNamedItem("Text").InnerText);


Try that

thanks alot your syntax makes alot more sense than the msdn examples and the nastiness of characters i tried to mash together
i'll give it a shot in the morning and let ya know how it goes

Quote (AbDuCt @ Jul 7 2016 06:12pm)
I am not to sure about xpaths/xml in C# sorry, but if you absolutely give up you can just use regex:

Code
<t:Span Text=\"(.*)\" \/>


Although this would match any node that is called t:Span Text, not just your node. AKA if you have any duplicates it will also retrieve those.


ya, that's pretty much what i did as a temporary solution just to get the ticket finished, i wrote them as two seperate splits cuz i was too lazy to figure out what the index would be if i split the whole stringon _____" />_____, but ya same idea
it should work fine the string i'm parsing from is a message sent across an app stored in the database and the span text is the contents of the message(what i want)
Go Back To Programming & Development Topic List
Add Reply New Topic New Poll