This post originated from an RSS feed registered with Java Buzz
by Wilfred Springer.
Original Post: Xeger has arrived!!!
Feed Title: Distributed Reflections of the Third Kind
Feed URL: http://blog.flotsam.nl/feeds/posts/default/-/Java
Feed Description: Anything coming to my mind having to do with Java
Last Friday, I quickly tried to generate some test XML samples out of an industry standard XML Schema, using xmlgen. And it failed. Even though xmlgen is a great tool, it's currently not capable of accommodating for every type definition encountered in the schema. More specifically, it fails to support schema definitions that include restrictions based on regular expressions, like these:
Tijdnotatie als hh:mm.
In order to be able to support schema definitions like these, xmlgen would have to be able to basically revert the regular expressions, and generate text snippets that would be considered valid according the expression.
When I started to look around, to see if there was something out there capable of doing that, I couldn't find anything like it in Java. Perl and Ruby had some support for it, but that's it. I asked around on Stackoverflow, but no solution showed up.
It made me wonder if I could roll my own 10+ years ago, I did something similar when creating Javascript validators for zipcodes, phone numbers, etc. Back then, I basically constructed by own finite state machine to validate patterns (Javascript didn't have regex support yet). The finite state machines also allowed me to generate valid samples, by simply randomly walking the state transitions.
So the question was if there was anything out there creating state machines from regular expressions. And then somebody pointed me to this project. It was exactly what I needed. Creating something capable of generating text out of these state transition definitions was a breeze. Xeger was born. (Xeger = opposite of Regex)
So this is how it works. You pass in a regex, and Xeger will generate random text matching this regular expression.
String regex = "[ab]{4,6}c"; Xeger generator = new Xeger(regex); String result = generator.generate(); assert result.matches(regex);
In the example above, Xeger will generate Strings such as "aababc", "bbbbbbc", etc.
Next stop will be to complete my updated version of xmlgen. You can find Xeger here.