[Atom] [Mail] [Twitter]
Liens : git · hacks · divers · cabale · buzz! · à propos +
Au menu
\/

Not dead yet: xmltools, long after the summer

Lire l'article · Voir tous les commentaires · Commenter
netbsd parsing xmlgrep xmlsed xmltools

# Pierre Bourdon
18.04.10, 22:03.

Did not find this story boring at all :) . Please keep up the hard work on this project, I can think of at least 42 tasks where xmlgrep and xmlsed could be very helpful (for example, I would really like to be able to create an intelligent grep for C files which works on the code AST instead of lines) !

By the way, I’ll try to find some time during next week to create a xmltools-git package on AUR (Archlinux User Repositories).

[ tag:blog.huoc.org,2010-04-18:comments/1271621009.30221 ]
/\ \/

xmlsed prototype

Lire l'article · Voir tous les commentaires · Commenter
api xmlsed xmltools

# Nobody
20.10.09, 23:36.

This will work as expected.

Indeed, and this is as described in xml_pattern(7), that I should have read more carefully. Sorry about that.

it’s just that I have a lot to work on and other areas need attention right now

As I said, I understand that. :)

Anyway, have you tried xmlsed a bit or toyed with more advanced features (subpatterns, groups, etc.)?

I did tried xmlsed, and it worked fine. (And I wish this tools had existed three years ago, I bet it could have saved me some headaches when I was then trying to handle inconsistently structured XML files!)

But in both cases, I didn’t try (yet) anything overly complicated, only some simple tasks that only require quite simple statements.

[ tag:blog.huoc.org,2009-10-20:1256074616.13991 ]

# rz0
20.10.09, 09:54.

Well, it’s not that it doesn’t handle prefixes, it’s more like… as I was saying in the post, the syntax is a bit of a mess. You can only use the short name predicate syntax with alphanumeric names; if you want to match anything else, you have to write it like this:

$ xmlgrep '$="prefix:a"'

This will work as expected. I know the syntax is pretty obscure; aside from me, there is probably nobody who knows the entire syntax, actually. Even David writes suboptimal patterns from time to time. :)

About the "two-way match" feature, I’ll see if I can do something about it, but it’ll probably have to wait. It’s not that it’s hard to do, it’s just that I have a lot to work on and other areas need attention right now (e.g. getting the syntax to something that people will actually be able to use without telling me "I can’t do that" while in fact, they can, only it’s hidden somehow).

Anyway, have you tried xmlsed a bit or toyed with more advanced features (subpatterns, groups, etc.)?

[ tag:blog.huoc.org,2009-10-20:1256025293.11318 ]

# Damien
19.10.09, 23:29.

Unfortunately, it seems that xmlgrep currently does not recognize any node whose name contains a namespace prefix.

Here is the simplest test case. Without namespace prefix:

$ xmlgrep a <<< '<a/>'
<a/>

And with a dummy prefix:

$ xmlgrep 'prefix:a' <<< '<prefix:a/>'

As for the two-way match that you suggest, yes, that’s exactly the behaviour I would love to see. :)

My problem is, I’m not always the author of the XML files I have to process, and I don’t know in advance what prefix the author had associated to a particular namespace. So it would be very nice if I could just invoke xmlgrep as follows

$ xmlgrep -n "prefix=http://www.example.org" \
  'prefix:a[prefix:b/.="2008"]' file.xml

and have it “do the right thing” even if in file.xml, the namespace "http://www.example.org/" is associated to another prefix.

Of course, I realize this is not a trivial feature to implement, and if you don’t see a point in it, I will understand that you focus on what matters to you. :)

[ tag:blog.huoc.org,2009-10-19:1255987782.11318 ]

# rz0
19.10.09, 21:02.

Thanks for the comment. As for XML name spaces, what special treatement would you expect? At the moment, name spaces are not treated differently from other elements, you can just match them as prefixes. Of course, this completely ignores the URI specification associated with the name space prefix, but in my opinion, having to match the URI rather than the prefix would be way too verbose and impractical for most uses. We could imagine more complex schemes like a two-way match; e.g. I specify that prefix X stands for uri_X, the document specifies x be used for uri_X, and the parser automatically converts node predicates in patterns from X:whatever to x:whatever. But the question is: is that useful enough to justify the added complexity? Do you have a precise use case in mind?
[ tag:blog.huoc.org,2009-10-19:1255978946.11318 ]

# Damien
18.10.09, 19:32.

Hello,

Very promising tools. I tested them for a while on my own XML files, they worked pretty nice. :)

Do you plan to add support for XML namespaces?
[ tag:blog.huoc.org,2009-10-18:1255887161.7462 ]