Python 3, 2·267 + 510193 = 510727
Predictor
def p():
d={};s=b''
while 1:
p={0:1};r=range(len(s)+1)
for i in r:
for c,n in d.setdefault(s[:i],{}).items():p[c]=p.get(c,1)*n**b'\1\6\f\36AcWuvY_v`\270~\333~'[i]
c=yield max(sorted(p),key=p.get)
for i in r:e=d[s[:i]];e[c]=e.get(c,1)+1
s=b'%c'%c+s[:15]
This uses a weighted Bayesian combination of the order 0, …, 16 Markov models, with weights [1, 6, 12, 30, 65, 99, 87, 117, 118, 89, 95, 118, 96, 184, 126, 219, 126].
The result is not very sensitive to the selection of these weights, but I optimized them because I could, using the same late acceptance hill-climbing algorithm that I used in my answer to “Put together a Senate majority”, where each candidate mutation is just a ±1 increment to a single weight.
Test code
with open('whale2.txt', 'rb') as f:
g = p()
wrong = 0
a = next(g)
for b in f.read():
wrong += a != b
a = g.send(b)
print(wrong)