The Get_Line Mystery



This is a very subtle feature of Get_Line, that at first blush may seem
confusing.  But it really does make sense once explained.  Read on.

The issue is that Get_Line reads into a fixed size buffer.  What if the
input is longer than (or equal in size to) the size of the buffer?

You can read the entire line using multiple invocations of Get_Line, but
somehow Get_Line has to tell you that it has read all of the current
line.  How does it do this?

Consider another, similar problem.  Suppose you have a stream of
integers you want to read in from standard input, like this:

  loop
    Get (N);
    ;
  end loop;

But there's a problem here.  How does this loop terminate?

A standard idiom for this kind of thing is to use a sentinel of some
kind, a special value outside the range of "normal" values, that
indicates that all of the stream has been consumed.

In our example above, we might use the value 0 to indicate that there
are no more values to read.  So our loop now looks like this:

  loop
    Get (N);
    exit when N = 0;
    ;
  end loop;

Now all is well, and the loop terminates when it reads in the value 0.
This scheme works because 0 isn't in the set of values you want to
process.

Back to Get_Line and reading a string.  What we need is a way for
Get_Line to read in a value "outside the range of string," which
indicates to the caller that there is nothing left on the current line.  

It's exactly analogous to our integer stream.  You read in a special
value "outside the range of process-able integers," which indicates to
the caller that there is nothing left in the stream.

In order to read in the entire line, we'd structure our loop like
similar to what we did above, but for a "stream of substrings":

declare
  Line : String (1 .. 10);
  Last : Natural;
begin
  loop
    Get_Line (Line, Last);

    < process Line (1 .. Last) >

    exit when ???;
  end loop;
end;

What is our termination condition?  Is is this: "All text on current
line has consumed" is indicated by the condition "input buffer has some
unused characters."

We can state that termination condition in code as

  Last < Line'Last

Basically, at the point that Get_Line returns fewer characters than
you've allocated in your input buffer, it means all of the current line
has been consumed.

Most of the time you don't have to think about this, because you usually
allocate a larger buffer than you'll ever need.  That's why an input
buffer is typically 80 or 132 characters long.

Actually, if you want to be technical, you'll want to allocate one extra
character --that you know will go unused-- so that you can consume all
on the input using a single invocation of Get_Line.  So technically,
you should declare an input buffer that is 81 or 133 characters long.

In your example, you allocated a buffer that's 10 characters long:

declare
  Line : String (1 .. 10);
  Last : Natural;
begin
  Get_Line (Line, Last);

Everything is fine and dandy when the input line contains fewer than 10
characters.  And if the input line is longer than 10 characters, it
makes intuitive sense that you have to do more than one fetch, because
your buffer is too small to contain all the input.

The weird case is when the input line contains the same number of
characters as the size of the input buffer.  But remember that there has
to be at least one character in the input buffer that goes unused,
because that is how Get_Line tells you it read all the input.

If the input line is the same length as the input buffer, then you have
to wait until the next Get_Line to learn that "all input on the current
line has been consumed."  The first Get_Line returns all the text on the
line (in your example, the buffer is full with all 10 chars, and Last =
Line'Last), and the second Get_Line returns 0 characters, which tells
you you're done (Last < Line'Last).

That's the technical explanation of why you have a "problem" using
Get_Line.  However, this behavior makes sense once you grok the idea
that you need at least one slot to go unfilled, in order to indicate
that the entire line has been consumed.  (And in your example, every
slot goes unfilled, in the second call to Get_Line.)

Now that you know why, you should be thinking by now, "OK, what do I
need to do."  Here are a few ideas:

1) Decide by fiat that you'll accept a max length line of input.  So
   allocate an input buffer with an extra character:

declare
   Max_Length_Of_Input : constant := 80;
   Line : String (1 .. Max_Length_Of_Input + 1);
   Last : Natural;
begin
   Get_Line (Line, Last);
   < process Line (1 .. Last) >
end;


2) Allow any length as input.  Cache the input substring-at-a-time in an
   unbounded string buffer, and then process that when you've read in
   the entire line:

declare
   Buffer : Unbounded_String;
   Line   : String (1 .. 10);  -- actual length doesn't really matter
   Last   : Natural;
begin
   loop
     Get_Line (Line, Last); 
     Append (Line (1 .. Last), To => Buffer);
     exit when Last < Line'Last;
   end loop;

   
end;


3) If you're using GNAT, then the above algorithm is provided for you:

package Ada.Strings.Unbounded.Text_IO is

   function Get_Line                                return Unbounded_String;
   function Get_Line (File : Ada.Text_IO.File_Type) return Unbounded_String;
   --  Reads up to the end of the current line, returning the result
   --  as an unbounded string of appropriate length. If no File parameter
   --  is present, input is from Current_Input.
...


Contributed by: Matthew Heaney
Contributed on: January 24, 1999
License: Public Domain
Back