Working on really hard problems, probably the problem of intelligence.
Pinned Loading
-
-
rlvr_pipeline
rlvr_pipeline PublicA composable component orchestrator for Reinforcement Learning from Verifiable Rewards (RLVR) training of Large Language Models on reasoning tasks.
Python 1
-
intractai/IntractCodeAPI
intractai/IntractCodeAPI PublicAn API designed for code completion and fine-tuning of open-source large language models on internal codebases and documents.
-
base_reinforcement_learning
base_reinforcement_learning PublicThis is the code-base that I personally use as the starting point for any reinforcement learning codebase with the purpose of fast experimentation and analysis.
-
Deep-Reinforcement-Learning-CS285-Pytorch
Deep-Reinforcement-Learning-CS285-Pytorch PublicSolutions of assignments of Deep Reinforcement Learning course presented by the University of California, Berkeley (CS285) in Pytorch framework
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.




