2 Comments
User's avatar
blaine wishart's avatar

I like the idea of Chartpacks. Looking forward to more.

Moving past the Turing Test is refreshing. The notion that we are moving toward AGI is in the air, but the notion is not crisp. Goethe's Faust comes to mind (The Study, scene 3). Goethe's moves from an examination of ‘the word’ to ‘the act’. Is this line of thinking useful for considering benchmarks?

It seems like a new level of reinforcement learning has been central to recent progress. Should we measure interactive human input in addition to the size of text examined and parameter count?

Examination of natural scenes as training data may loom large. Another thing to measure.

The Bloom project from BigScience suggests the amount and quality of energy required is important.

I just did a quick GPT check. Evidently about 15% of the world's population speaks English as a first or second language. To the extent that we are interested in general intelligence, measuring performance across languages is important. In this regard, the reference to Stephen Jay Gould’s Mismeasure of Man, is especially valuable.

Expand full comment
Nathan Warren's avatar

Excellent analogy with Faust! I think it's increasingly important to start framing these models' impacts in terms of their downstream effects - I think this sort of thinking will rapidly move into the limelight when these models start interfacing with the world (e.g. doing tasks through plugins and API connections)

Interesting point on reinforcement learning - I've only seen with vs without RL from human feedback comparisons, I haven't seen any studies on its effect as you increase the amount (I'll have to take a look).

Expand full comment