Thinking fast and slow with LLMs

LLMs have a lot of potential to be used in devices and edge computing, but right now they are too big to run efficiently on local devices while getting good quality. I wonder if there is an approach where we can mimic the system outlined from Daniel Kahneman in book “Thinking, Fast and slow”. In his book, Daniel outlines that brains seems to be segmented into 2 cognitive systems, system 1 and system 2.

A fast and small llm that runs “unconsciously” but not as smart, aka system1, is built from scratch and purposely kept small.

A slower llm that is the conscious system 2. This would be like chatgpt which runs in the cloud, and you use the model occasionally for heavy computation. You use the answers from system 2 to inform system 1 and to help train it, but you want your system to run from system 1 as often as possible.

You would use system 2 to reflect and think about bigger problems.

System 1 would need to know if it doesn’t have a good answer and then query system 2.

Leave a comment

Your email address will not be published. Required fields are marked *