Skip to content

Not getting perf improvements from muP at ~1.5B scale #76

@gordicaleksa

Description

@gordicaleksa

Hey guys, first of all thanks for the awesome work!

I've implemented muP in the llm.c project (see here), the coord checks seem to be flat / correct (I went up to 15 steps and still flat!) but I am not getting any performance improvement using mup?

Could it be that this is due to smaller scale? We're testing it on 1.5B LLMs. Should we expect a different behavior at ~7B?

I wrote up a mini document on what i've done to support mup in llm.c here under mup.md.

Am I missing something here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions