I needed to process a config file just now. Because of the way it was generated, it contains lines like this:
---(more 15%)---
The first step is to strip these unwanted lines out. As a slight twist, each of these lines is followed by a blank line, which I also want to strip. I created a quick Python script to do this:
skip_next = False
for line in sys.stdin:
if skip_next:
skip_next = False
continue
if line.startswith('---(more'):
skip_next = True
continue
print line,
Now, this works, but it's more hacky than I'd hoped. The difficulty is that when looping through the lines, we want the content of one line to affect the subsequent line. Hence my question: What's an elegant way for one loop iteration to affect another?
Answer 1
The reason this feels awkward is that you're fundamentally Doing It Wrong. A for
loop is supposed to be a sequential iteration over each element of a series. If you're doing something that's calling continue
without even looking at the current element, based on something that happened in a previous element of a series, you're breaking that basic abstraction. You're then introducing awkwardness with the extra moving parts required to take care of the square-peg-in-round-hole solution you're setting up.
Instead, try keeping the action close to the condition that causes it. We know that a for
loop is just syntactic sugar for a special case of a while
loop, so let's use that. Pseudocode, since I'm not familiar with Python's I/O subsystem:
while not sys.stdin.eof: //or whatever
line = sys.stdin.ReadLine()
if line.startswith('---(more'):
sys.stdin.ReadLine() //read the next line and ignore it
continue
print line
Answer 2
Another way to do this is with itertools.tee
, which allows you to split the iterator into two. You can then advance one iterator by one step, putting one iterator one line ahead of the other. You can then zip up the two iterators and look at both the previous line and the current line at each step of the for
loop (I use izip_longest
so it doesn't drop the last line):
from itertools import tee, izip_longest
in1, in2 = tee(sys.stdin, 2)
next(in2)
for line, prevline in izip_longest(in1, in2, fillvalue=''):
if line.startswith('---(more') or prevline.startswith('---(more'):
continue
print line
This could also be done as an equivalent generator expression:
from itertools import tee, izip_longest
in1, in2 = tee(sys.stdin, 2)
next(in2)
pairs = izip_longest(in1, in2, fillvalue='')
res = (line for line, prevline in pairs
if not line.startswith('---(more') and not prevline.startswith('---(more'))
for line in res:
print line
Or you could use filter
, which allows you to drop iterator items when a condition is not true.
from itertools import tee, izip_longest
in1, in2 = tee(sys.stdin, 2)
next(in2)
pairs = izip_longest(in1, in2, fillvalue='')
cond = lambda pair: not pair[0].startswith('---(more') and not pair[1].startswith('---(more')
res = filter(cond, pairs)
for line in res:
print line
If you are willing to go outside the python standard library, the toolz
package makes this even easier. It provides a sliding_window
function, which allows you to split up an iterator such as a b c d e f
into something like (a,b), (b,c), (c,d), (d,e), (e,f)
. This does basically the same thing as the tee
approach above, it just combined three lines into one:
from toolz.itertoolz import sliding_window
for line, prevline in sliding_wind(2, sys.stdin):
if line.startswith('---(more') or prevline.startswith('---(more'):
continue
print line
you could additionally use remove
, which is basically the opposite of filter
, to drop the items without needing a for
loop:
from tools.itertoolz import sliding_window, remove
pairs = sliding_window(2, sys.stdin)
cond = lambda x: x[0].startswith('---(more') or x[1].startswith('---(more')
res = remove(cond, pairs)
for line in res:
print line
Answer 3
In this case, we can skip a line by manually advancing the iterator. This results in code that is somewhat similar to Mason Wheeler's solution, but still uses the iteration syntax. There is a related Stack Overflow question:
for line in sys.stdin:
if line.startswith('---(more'):
sys.stdin.next()
continue
print line,