Previous | ToC | Up | Next |
Alice: I have enjoyed getting a bird's eye view of your clop.rb
file. Let's get a little closer to the ground now. Where shall we swoop
down?
Bob: I suggest that we continue our tour on the level of the Clop class,
before descending all the way to the internal workings of the individual
options, the machinery of which is contained in the Clop_option
class.
However, more than halve of the Clop class code lines are dedicated to
the help facility. It is not necessary to look at these lines in order to
understand how normal options are being parsed. So I suggest that we
continue our tour in three easy journeys. First we inspect how a normal
option is handled on the Clop level. Second, we descend to the
Clop_option level, to see how the corresponding option block
is parsed and used. Third, we go back to the Clop level in order to
figure out how the help facilities works.
Alice: Sounds good to me!
Bob: The first journey is by far the simplest, and shortest. Of the
three actions ordered in the Clop initializer:
we have already seen how the first action parse_option_definitions
consisted in handing all the work to the initializer one level lower,
through a call to Clop_Option.new. So that part will be visited
in our second journey.
Similarly, we have seen that the request for the third action also was
handed down directly to the individual options on the Clop_Option
level. All we have to do in our first journey is to figure out how the
method parse_command_line_options works.
Alice: Can you show me this method again?
Bob: Here it is:
The first two if and elsif branches concern the help facility, which
we will address in our third journey. So we only have to inspect the
following three methods here, during our first journey: find_option
and parse_option and initialize_global_variables.
Here is the first one:
Alice: The top part is clear. You hand it a string that contains
something like "-d" or "--step_size". I presume
that the option class Clop_option has a method longname
that returns exactly the string "--step_size" and a method
shortname that similarly returns "-d".
Bob: Well presumed!
Alice: Now if the option is recognized as the long name version
of option i in the option array, the value i is returned, as it
should be. But what happens with the short name?
Ah, wait, before you answer my question, let me think. This must be
connected with the fact that you allow for short options to be glued
to their values. For example "-d0.001" would be a valid
format.
Bob: Indeed, even though a user would not be likely to write it that
way, since it does look a bit confusing. However, if we allow
"-n3", we should allow "-d0.001" as well.
Alice: Agreed. So I understand that you want to check only whether
the -d part is present in the string s, while that string
is allowed to contain more. Now you do that by turning the shortname
of the option into a regular expression.
Bob: Yes: if you want to compare two strings, the proper and clean
way to do so in Ruby is to change the string at the right-hand side into a
regular expression. This is like converting a integer into a floating point
number. In a way, nothing changes, except that now it has become an instance
of a different class. For the number, an Int instance has become a Float
instance, and here in our case, a String instance has become a Regexp
instance.
Alice: and the comparison operator =~ returns true if
@options[x].shortname is indeed contained in the string s.
Bob: Yes, except that it returns the position of the first character
of the match, rather than true. But what concerns us here is that it
does not return nil, which would be interpreted sa false; anything
that is not nil or false is considered to be true. Even the
null string "" is true in Ruby, another thing to watch out
for if you are a C programmer.
Alice: And a more logical use of the notion of true, if you ask me.
A non-null string string is still more than nothing.
Bob: Yes, I agree, though it took me a while to shake the C habit.
Alice: Now I think I understand all about this find_option method,
except for that last cryptic bit, and $` == "". What is that doing
there? And what does it do?
Bob: Ah, that is a nice addition, if I may say so myself. At first
I had not put that in, but when I looked at this method, without that
addition, I had the feeling that something wasn't right. When I thought
about it, I realized that there was still a possibility for ambiguity.
Alice: Like?
Bob: Like having an option with a long form --number_of_particles
and a short form -n. Can you see what would happen in that case?
Alice: Let me inspect. Ah! Yes, of course. In the case of the long form,
you still match correctly against -n, as the second and third
character of the long form. How devious!
But wait a minute. If you first check the long form, you could bypass the
check for the short form, by turning the two if statements into an
if...else statement.
Bob: Yes, that would work in the specific case I just mentioned, where
there is only a confusion between the two ways of writing the same option.
But what if there is a possible confusion between two different options?
Here is an example. Let there be another option with a long form called
--neutron_star_type. Now that option, too, matches -n.
So we have to protect different options from each other, and we cannot
assume safety just by shadowing the short option check by the long option
check.
Alice: You are right! But I still don't understand the syntax of your
solution. I would have checked whether the match started at the beginning.
Didn't you say that the match attempt returns the position of the first
character of a successful match?
Bob: Indeed. And you are right. I could have written
Alice: I see. That is good to know. I guess those rather cryptic
shorthands are borrowed from Perl.
Bob: I think so.
Alice: Okay, I now fully understand how find_option. On to the
next station of our first journey!
Bob: Here is the next station. After we know which option we are dealing
with, we have to parse it. This happens in the following method:
Now this is a bit more complicated, since there are several forks in the
road. The first fork is related to the question: is the type of the
option boolean? In other words: are we dealing with a flag? A flag can
only be true or false. By name the flag as a command line option,
the user intends to set the flag, i.e., to the value true. By
leaving out that option, the user intends to keep the default value false.
For example, in our N-body code, the user can ask for extra
diagnostics by including the option -x, which leads to the
corresponding global variable $x_flag as we have specified
already. By default $x_flag = false. If the option -x
is encountered, we have to change this variable to $x_flag = true.
This happens by setting the valuestring of the boolean option to true
as you can see at the beginning of the code fragment above.
Alice: This valuestring is probably implemented as a string
@valuestring within the Clop_option class, and there
that string is used later to obtain the actual value?
Bob: All correct, as we will see during our second journey, but you don't
have to rely on that, on this level: it could have been implemented in
a different way, as far as the Clop class is concerned. The only important
thing is that there is a `setter' method provided for the Clop_option
class, that somehow sets the internal information of the Clop_option
instance in such a way as to guarantee that the boolean value of the option,
when asked for later, will return true.
Hmm, that sounded more complicated than it really is. Often things are
much clearer on the code level than when you try to express it in words.
Alice: The same is true in mathematical equations, of course, once you
understand all the symbols . . .
Bob: . . . and once you are sufficiently familiar with manipulating the
symbols that they are becoming old friends.
Alice: Yes, until that point it is still helpful to have clumsy sentences
in a natural language to help you get the idea. So, please continue
to be clumsy, and tell me what happens next. We have encountered a fork
in the road. It the option is boolean, we set it to true without needing
to read anything more from the command line, and we happily return.
Bob: And if the option is not boolean, we take the other fork in the road,
by continuing the travel through the method parse_option.
Alice: Ah, I see, if the type of the option is not boolean, you have
to extract the value from the next little bit of command line information,
by accessing arg_array. But wait a minute, I see two lines
where you assign something to @options[i].valuestring, no, three
lines; one at the very bottom too.
Ah, that last one deals with vectors, and you already explained that vectors
are special, in that their value can be spread out over different bits of
string in the command line. So let's leave that for later. But what about
these two assignments of @options[i].valuestring right in the middle?
Bob: The main assignment, the one you should look at first, is this one:
In most cases, after encountering a new option name, you just read in the
value corresponding to that option, as the next little string that came
from the command line. If there is nothing left to be parsed on the command
line, that just means that the user has forgotten to provide a value: an
error message is printed, and execution of the code is halted.
Alice: But what happens if the user provides a next option, instead of the
value for the previous option? Imagine that the user writes -n -x.
Bob: In that case, an attempt will be made to set the number of particles
to -x, which will result in something silly. But hey, we can't
protect the user from all possible errors! I don't know how to anticipate
on this level what is and is not correct. Others, using this code in
the future, will undoubtedly use it for more general purposes than I
can currently envision, so I don't want to constrain too much what can
and cannot be said.
Alice: Hmmm. You could at least insist that a valid number would be
provided when the type of a variable is given as an int or float.
Bob: Perhaps. We could come back to those questions later, and try
to make everything industrial-strength. For the time being, I'm happy
if everything works under reasonably normal circumstances with
reasonably intelligent users.
Alice: Well, if you talk about users that don't make errors, then
I have to conclude that nobody fits the criterion of being `reasonably
intelligent'. But okay, for now let's move on. I'd probably want to
come back to this point later, though.
Bob: Now if you look just above the two lines I quoted above, you find:
This addresses the case where a one-character option is used, without
any space separating the option and the value, as in -n3, a very
compact notation which we already discussed before.
Alice: What is the meaning of this funny looking repetition of the
symbols ^-? They occur twice, with a square bracket in between,
and a closing bracket at the end, as ^-[^-].
Bob: This is one of the most confusing aspects in the notation of regular
expressions, this overloading of the meaning of the up-arrow
^. In fact, the two up-arrows here are two completely different
things. In order to see this, let us inspect the whole regular
expression:
Alice: Yes, that notation I am familiar with. But how can you start
at the beginning of a line for the second time.
Bob: You don't. Within square brackets, the up-arrow ^ has the
effect of negating the meaning of the next character. So the combination
[^-] simply means: any character but the - character!
In other words, by writing
Bob: You'll get used to it.
Alice: Now that I understand the first half of the first line, let me
stare at both lines again:
You have told me that the variable $' contains the rest of the
string, the part after the part which matched. So if we start with the
option "-n", and if we insist that it should start with one and
only one hyphen, then $' = n, right?
Bob: Wrong.
Alice: Huh?
Bob: Try it!
Alice: Okay:
Bob: Why don't you try the compact option-value notation -n3
Alice: Here goes:
Bob: What happened is that the matching attempt s =~ /^-[^-]/
involves two characters: first the hyphen and then the next character,
for which it is checked that it is not a hyphen.
Alice: Ah, although in plain English we can describe this match as
`a check that there is one and only one hyphen', in fact it is a match
where the first two characters are being checked as being an ordered
pair `hyphen followed by non-hyphen.'
Now I see what happened. And since this all happens in the case of a
one-character option, the non-hyphen that gets eaten is the option
character, so that what is left is exactly the value that needs to be
assigned to the variable corresponding to the option.
So what you do at the end of this complicated line, is that you check
whether the remainder, stored in $' contains at least one
alphanumeric character or underscore, which is what the \w
stands for.
Bob: Exactly.
Alice: Okay, I see now what happens. But I think you could have written
this in a simpler way.
Bob: How?
Alice: Instead of
Alice: I just find it confusing, rather than interesting, but to each
his own taste! Let's move on to the last case, at the end.
Bob: This is a lot simpler. Here we are dealing with the case that
the option type is that of a float vector, a vector of the type
we have defined before, with components that are all floating point numbers.
As I already mentioned, a vector on the command line should be given in
Ruby array notation, with the numbers enclosed between square brackets,
[].
There is a lot of freedom for the user: the vector can be
written as a string, like "[3, 5]", or without those double
quotes directly as [3, 5]. The numbers can be comma separated,
but they can also just be space separated, as in [3 5]. Spaces
are allowed next to the brackets: [ 3 5] and [3 5 ] and
[ 3 5 ] are all equally fine.
There is one catch to be aware of, when you leave of the double
quotes: on the command line [ 3,5] and [3, 5] and
[3,5 ] are all fine, but [3,5] is likely to give you
an error message.
Alice: Why?
Bob: It depends on the Unix shell you use, but chances are that the shell
tries to interpret this as an attempt to address files in the current
directory. Unless you happen to have a file with the name 3 or
a file with the name 5, and expression on the command line
containing [3,5] will probably generated a short dry message
No match.
Alice: That's good: short and simple, and it makes it clear that there
is no subtle Ruby bug involved.
As for your implementation, let me look at what you wrote for vector
parsing:
You allow some flexibility in writing the type: it could be
float vector or float vector or even
float vector.
Bob: Sure, it would seem to restricted to insist on one literal way
of writing it. I can easily see someone adding an extra space between
the two words, and perhaps a tab or whatever would strike them as looking
better. I have consistently given the users that freedom, also in parsing
the lines within the Clop_Option class, as we will see in our
next journey.
Alice: And then you keep shifting new content from the ARGV array
until you finally encounter a string that contains a closing square
bracket ]. During that whole process, you keep adding what
you find to the valuestring of the option you are working with, so
that you build up the whole vector again, from the bits and pieces
from the command line that were stored in successive elements of the
ARGV array, here called argv_array.
One last question: why don't you just string those strings together?
What is the need for adding a " " between the bits and pieces?
Bob: If all the vector elements were comma separated, as in
[2,3,4], there would be no need to do so. However, I give
the user the flexibility to use a space separated notation as well.
Take the example of a vector written as [2 3]. In the ARGV
array, this will be distributed over two elements, the first being
"[2" and the second one "3]". Now if you would just
string those two strings together, as you suggested, you would get
"[23]", a one-dimensional vector with one element, 23.
Not what you wanted.
Alice: I see. Good! Now I believe there is one station left on our
first journey?
4. The First Journey: Clop, the Non-help Part
4.1. Three Journeys
4.2. Inspecting find_option
4.3. The Last Cryptic Bit
i = x if (s =~ Regexp.new(@options[x].shortname)) == 0
However, I preferred to use the $` variable. After every
successful match, the matched part of the string is assigned to the
variable $&, while the part of the string before the match
is assigned to $` and the part of the string following the
match to $'. So I just checked whether $` was
equal to the empty string:
i = x if s =~ Regexp.new(@options[x].shortname) and $` == ""
4.4. Inspecting parse_option
4.5. Extracting the Value: Normal Case
4.6. Extracting the Value: Compact Case
if s =~ /^-[^-]/ and (value = $') =~ /\w/
@options[i].valuestring = value
/^-[^-]/
The first ^ specifies the beginning of the string. The presence
of - immediately following means that the string has to start with
a - sign. Now the square brackets are normally used to give
you a choice, as in [aei] or [a-f]. In [aei]
it is understood that any of the three letters a or e or i could be
present and still form a match. And in [a-f], any letter in the
range a, b, c, . . , f would form a valid match.
if s =~ /^-[^-]/
we ask whether it is true that the string s begins with a hyphen,
but does not begin with two consecutive hyphens. Let me show you:
|gravity> irb
irb(main):001:0> "-n" =~ /^-[^-]/
=> 0
irb(main):002:0> "--nono" =~ /^-[^-]/
=> nil
Alice: Ah, very nice, though difficult to parse for a human like me.
4.7. Interesting or Confusing?
if s =~ /^-[^-]/ and (value = $') =~ /\w/
@options[i].valuestring = value
|gravity> irb
irb(main):001:0> s = "-n"
=> "-n"
irb(main):002:0> s =~ /^-[^-]/
=> 0
irb(main):003:0> $'
=> ""
Hey, that is strange! Why should it be the empty string? What happened
to n ?
irb(main):004:0> s = "-n3"
=> "-n3"
irb(main):005:0> s =~ /^-[^-]/
=> 0
irb(main):006:0> $'
=> "3"
Somehow the n gets eaten up and disappears without a trace, but the
3 survives.
if s =~ /^-[^-]/ and (value = $') =~ /\w/
you could have used
if s =~ /^-\w/ and (value = $') =~ /\w/
Bob: Ah, I had not thought about that. I guess I was just to fixated
on hyphenation! But, now that I figured out how to do it, I find my
double hat trick, or double up arrow if you like, quite elegant. Or
at least interesting.
4.8. Extracting the Value: Vector Case
Previous | ToC | Up | Next |