Minimal concepts needed to encode human language in a computer

I propose that there exists a minimal set of concepts that can be used to describe and encode almost all meaning of human language. From this base set of concept primitives, complex conjoined concepts can be created due to compositionality and abstract structures. One cannot learn complex concepts like quantum mechanics, calculus, or homework without learning many other intermediary concepts, but those intermediary concepts can be traced back to these concept primitives.

By knowing the base concepts, a human could communicate and survive in our world.

And if a computer could model these base concepts in a way similar to humans, then they can learn to represent everything else in the human vocabulary.

Concepts are models and can be described with words. I will use words to describe each concept.

There has been some research into related areas. Toki Pano is a constructed language with the goal of simplifying thoughts and communication. It has ~137 root words and then all other words are based off of them.

Natural Semantic MetaLanguage (NSM)” is a linguist theory that states there are approximately 65 semantic primes that are universal to most human languages. If you look up those semantic primitives in a dictionary, they cannot be described in other ways without going in circular references. For example “to touch” may be defined as “to make contact” and when you look at “to make contact”‘s definition, it as described as “to touch”. These semantic primitives are also argued that they cannot be learned from studying, but instead are learned through practice and are innate concepts that all humans understand.

I worked on an early prototype of teaching representations of core concepts to computers here with a focus on computer representation and not finding all the core concepts.

Leave a comment

Your email address will not be published. Required fields are marked *