Hi -- thanks for LintSeq / TinyCodeLM (arXiv:2410.02749). Your 150M result is, as far as I can find, the smallest published code LM with a non-zero HumanEval pass@1 (6.1 pretrain, 12.8 with edit-sequence fine-tune), and the edit-sequence reframing is the part that interests me most: a data-FORMAT lever, not just more tokens.
Context for one honest question. I have been probing a deliberately tiny CPU code model (a transformer at 148K and 493K params, VOCAB 263, max-seq 512) on a small C-function corpus. Measured result: compile@1 = 0/4 at BOTH 148K and 493K params, so width is not the lever at this scale; a 4x corpus cut val-bpb 2.8x but compile@1 stayed 0. I built a small deterministic calculator that places this config against published floors:
- param gap to TinyCodeLM-150M: ~304x; token gap: ~1e6x (72K tokens vs 72e9)
- Chinchilla-optimal (20 tok/param) deficit at 493K params: ~137x
- Wilson 95% 0-success ceiling: n=4 -> 0.49, n=16 -> 0.19, n=32 -> 0.11
(so a 0/4 cannot even rule out a true rate up to ~49%)
So my non-result is the expected outcome: I am orders of magnitude below your floor on BOTH params and tokens.
The one question I could not answer from the paper: does the edit-sequence data-format advantage (the relative pass@1 gain you see at 150M from re-expressing the same source as error-free incremental diffs) appear to transfer DOWN-scale, or does a token-quantity floor dominate first below, say, ~50M params? I am not asking for support or review -- just whether you have any read on where the format lever stops helping.
No reply needed if this is already covered in an appendix I missed. Thanks for the open weights and the clear write-up.
(Note: your TinyCodeLM is Python/HumanEval; my probe is a separate tiny C model -- I am asking about the down-scale transfer of the edit-sequence idea, not comparing the two directly.)
Hi -- thanks for LintSeq / TinyCodeLM (arXiv:2410.02749). Your 150M result is, as far as I can find, the smallest published code LM with a non-zero HumanEval pass@1 (6.1 pretrain, 12.8 with edit-sequence fine-tune), and the edit-sequence reframing is the part that interests me most: a data-FORMAT lever, not just more tokens.
Context for one honest question. I have been probing a deliberately tiny CPU code model (a transformer at 148K and 493K params, VOCAB 263, max-seq 512) on a small C-function corpus. Measured result: compile@1 = 0/4 at BOTH 148K and 493K params, so width is not the lever at this scale; a 4x corpus cut val-bpb 2.8x but compile@1 stayed 0. I built a small deterministic calculator that places this config against published floors:
(so a 0/4 cannot even rule out a true rate up to ~49%)
So my non-result is the expected outcome: I am orders of magnitude below your floor on BOTH params and tokens.
The one question I could not answer from the paper: does the edit-sequence data-format advantage (the relative pass@1 gain you see at 150M from re-expressing the same source as error-free incremental diffs) appear to transfer DOWN-scale, or does a token-quantity floor dominate first below, say, ~50M params? I am not asking for support or review -- just whether you have any read on where the format lever stops helping.
No reply needed if this is already covered in an appendix I missed. Thanks for the open weights and the clear write-up.
(Note: your TinyCodeLM is Python/HumanEval; my probe is a separate tiny C model -- I am asking about the down-scale transfer of the edit-sequence idea, not comparing the two directly.)