Previous | ToC | Up | Next |
Bob: Hi Alice, are you still convinced that you find a way to avoid
repetition in a command line option parser?
Alice: Hi Bob! I gave it a good deal of thought, and I came to the
conclusion that we were both right.
Bob: How can that be?
Alice: You were right in pointing out that the three places in your
program did three quite different things: parsing information, passing
it on to the computer, and optionally passing it to the user through a
help facility.
At the same time, I was right in pointing out the danger of having the
information about that information scattered around in those three
different places. I called it a violation of the DRY principle, the
notion of Don't Repeat Yourself. But I guess you did not commit a hard
violation of the principle, since you did not literally repeat yourself.
Perhaps you could call it a soft violation. The problem I objected to
was the fact that you mentioned the same option in three different places.
And even though you did something different concerning that option in
each of those different places, there still is the danger that if you
change the functionality of that option, you can introduce subtle bugs
if you don't update all three places correctly.
Bob: Yes, I agree that there is that danger.
Alice: In fact, when I looked at your code, I realized that you actually
deal with each option four times! I only saw the first three, when I
scanned the actual mention of each option, but at the end you use the option
information once again.
Bob: Can you show me?
Alice: Take the time step information, for example. First it appears
in your help description of the -d option:
Then it occurs a second time as the first option listed in the call to
the parser, which have loaded through the getoptlong package:
The third time, we encounter this same -d option in the inner loop
of the read_options method, in the lines:
What I had not realized yesterday was the fact that at the end, you echo
all the values that are set, before you start the integration. That is
certainly a good thing to do, since it allows the user to see explicitly
what parameter values were used by the integrator, including default values
that were not set by the user. But you see, again the time step shows up,
this time not through a mention of -d, but through the second
line in the initial print statement:
Bob: But why should that bother you?
Alice: imagine that you want to change the internal way to store the
data. Or more seriously, imagine that someone else wants to adapt your
program for additional applications, and therefore wants to change the
internal way to store the data. Instead of assigning the time step to
the variable dt, perhaps she wants to assign that value to a variable
dynamics_dt, since she also has a stellar evolution module for
which she is using a time step evolution_dt.
Now she has to realize that she has to change the appropriate line in
the inner loop of the read_options method, and in the second
line of the STDERR.print command. In those two places, dt
occurs three times, and she has to realize that she has to change
two of the three as follows: in the read_options place she
has to write:
Bob: Hmmm. I must admit that there is that possibility, yes.
Alice: So this is what I meant when I said that we were both right.
You are doing four different things in four different places, so
in that sense you are not repeating anything. At the same time, you
deal with the same variable in those four different places, either
through their command line option or their internal representation.
Bob: You are saying that I do repeatedly something different with
the same piece of information.
Alice: Exactly.
Bob: And it would be better if we could avoid that.
Alice: Indeed.
Bob: But I come back to my question: how can we avoid that?
It would be terribly clumsy to do all four actions for the first
option first, then to do those for actions for the second option,
and so on. In that way, you could keep the information for one
option all close together, but you would have to write a copy of
all four commands for each new option!
Alice: That would not be a good solution, I agree. We have to think
of something better.
Bob: You always like to come up with principles. Can't you think of a
principle that will help us out here?
Alice: Now that you challenge me, perhaps we can use Ruby itself as an
example. Ruby is built on the principle of indirect addressing, which is
why it is so flexible. Perhaps we could avoid using the information itself
in the four places that I pointed out. How about storing the real
information in a fifth place? If we can find a way to access that
fifth place in the other four instances where we need the information,
we can keep things under control.
Bob: Ah, you mean that the code writer has write access to the data
in that fifth place, while the other four places only have read access
to those data?
Alice: I guess, yes, that is a good way to put it. I mentioned the case of
someone wanting to adapt the code for a different purpose. Rather
than changing it in the two different places I mentioned above, writing
Alice: Yes. This is the principle of indirect addressing, or the
principle of indirection, I guess.
Bob: That sounds like a lack of direction to me. But lets forget
about naming your principle -- we could leave that open as well, and
indirectly address your principle later.
Alice: I see you still don't like principles, but you must admit,
this one gave us a new idea.
Bob: I admit. So how to go about this? Ah, each option would have
such a definition, right? And each option would need more than the
information of the name of the internal variable. At a minimum, it
would have to know about the name of the external handle, namely the
name of the option on the command line argument; in our case -d
or in longer form --step_size.
Alice: I can see from the look in your eyes that you're getting ready
to code something up!
Bob: And each option would need a type, for the input to work properly.
Even though Ruby has dynamic typing, someone somewhere has to tell Ruby
that the number of particles, when read in from the command line, has
to be converted to an integer, while the name of the integration method
retains the type of a string, and the time step size becomes a floating
point number.
Hmm. This becomes really interesting. So each option will be characterized
by a single block of information, which could be an instance of a new class.
And it could contain help information as well!
Alice: Ah, yes, of course. And if you would change either the functionality
or just the names of some of the information in a block, you would naturally
modify the help message as well, to reflect the changes you made. Since
everything lives together in one paragraph, so to speak, there is no obstacle
against keeping things up-to-date together.
Bob: We could even have more than one help level. For example, typing
"-h" could lead to a one-line help message being displayed, while typing
"--help" could give you a more detailed multi-line message.
Alice: So all the information would be bundled: the internal representation,
to be used by the computer when running a program, the specification
of the command line interface, to get the information from the user into
the computer, and help messages, to get information back to the user.
I like it!
Bob: Not only that, there is another flow of information back to the user,
when a program starts running and echoes its initial state, as you pointed
out earlier. So each block could have a `print name' for its internal
variable as well. For example, the number of particles could be specified
on the command line by typing -n 3 for an N-body system, or
--number_of_particles 3 or something like that. Internally the
variable storing that information might be n_part. But when you
echo the initial state, it is much more natural to just type N = 3.
This could be specified by a block with the lines
Alice: Very nice! In fact, when you have several options, and you give each
option such a detailed Long description, those together in effect
form a kind of man page, like in the Unix system, a form of manual page
that summarizes the interface to the program. And keeping all that
information within the code itself will be a form of insurance.
We all know of cases where the manual page for a code says one thing,
while the code does something else, because someone modified the code
and did not bother to update the man page. Now if the manual information
lives in the very same place where the actual information about the main
variables is kept, there is no longer any barrier against keeping
information up to date.
Bob: Even I can bring up the discipline to keep documentation up to date
in that way, I expect. And now that you mention man pages, a natural thing
would be to add examples in the Long description. How about:
Bob: Easy! We can insist on two blank lines between options. If we
insist that the Long description allows appears at the very
end of a block, then a single blank line means that the Long
description still continues, whereas a double blank line means that
we now start the next block, for a new option.
Alice: I think you have found a great way to make a top-down specification
for a user interface for all of our programs! Before we write a
program to parse all that information, how about going all the way,
top-down wise? We may as well specify the whole series of option blocks
for our N-body program. Once we are happy with that, we can implement
a parser, and then use all that to replace your previous driver
rkn1.rb.
Bob: Good! In that way, it will be easier to write the parser, with a
concrete example in front of us. I can just take the options from
rkn1.rb, fill in the blanks for the variables, and weave the
appropriate help texts into each block.
Alice: Go right ahead!
2. Encapsulating Information
2.1. A Soft Violation
2.2. Four Occurrences
case opt
when "-d"
dt = arg.to_f
2.3. Danger
dynamics_dt = arg.to_f
and in the STDERR.print command, she has to write:
"dt = ", dynamics_dt, "\n",
Do you see the potential for confusion, and hence mistakes.
2.4. A Matter of Principle
dynamics_dt = arg.to_f
and
"dt = ", dynamics_dt, "\n",
all she would have to do is to go to the storage place where all the real
information is kept, and change one line there. If the information was
kept there as:
Internal variable: dt
then all she would have to do is change this line to:
Internal variable: dynamics_dt
Bob: I think I begin to see how this could be implemented. The two
lines you mentioned could be replaced by:
time_step_variable_name = arg.to_f
and
"dt = ", time_step_variable_name, "\n",
and somewhere an action would be taken that would replace
time_step_variable_name everywhere with whatever the code
writer would have specified in the Internal variable: slot
for the time step option. In the above case,
time_step_variable_name would be replaced everywhere by
dynamics_dt.
2.5. An Option Block
Short name: -n
Long name: --number_of_particles
Value type: int
Default value: 1
Global variable: n_part
Print name: N
Description: Number of particles
Long description:
Number of particles in the N-body system,
that is generated by this program. The
positions will be chosen at random within
a sphere of unit radius, and the velocities
will be set to zero.
The Description content can then be displayed after a -h request,
and the Long description content appears when you give a
--help request. I started the latter on a new line, since it will
be the only piece of information that will need to be spread out over more
than one line, and starting it at the beginning of the line will allow us
to format it properly for display.
2.6. Growing a Manual
Long description:
Number of particles in the N-body system,
that is generated by this program. The
positions will be chosen at random within
a sphere of unit radius, and the velocities
will be set to zero.
Example: "ruby mk_cold_collapse -n 100"
will generate a cold system containing 100
particles.
Alice: So you allow blank lines in the output. Well, why not. If we
consider each help message to form a part of a manual, it would only be
natural to allow new paragraphs and blank lines to appear. It certainly
will make things more readable. We just have to be careful to find a unique
way to get the information listed so as not to confuse two options.
Previous | ToC | Up | Next |