GPT-4 for Nim?
ChatGPT (3.5) works already surprisingly good for Nim, see note at the end of section https://ssalewski.de/nimprogramming.html#_about_this_book. I recently subscribed to GPT-4, which is generally great. Of course, both versions have not that much Nim data available, and currently data is limited up to September 2021. For non-native speakers GPT is nice, as it can write text for you, you have only to do proof reading. My feeling is, that generally for topics with few data, GPT's text is in 10 percent just wrong. When it will become easier possible to feed GPT with custom resources like HTML and PDF files, it will become even more valuable.
Nim's small community there are not enough publicly available libraries to use as a reference.
This seems wrong to me. Some reasons below:
First, there are many algorithms and snippets in Rosetta Code. We do have many libraries on Github and on other hosting services (see e.g. icedquinn/icedgmath ).
Not enough to train? Just a few posts down, there has been some experiments with ChatGPT. It has trained on quite a lot of old obsolete code like those on Rosetta Code that should be updated.
I personally use Github Copilot which is great for:
@Yardanico wrote me a trie implementation for a syntax checker in ~3 minutes thanks to GHC. He corrected many of the GHC proposals but here it is:
import std/[strutils, options]
const
Letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
type
TrieNode = ref object
isWord: bool
s: array[52, TrieNode]
proc add(t: var TrieNode, w: string, i = 0): TrieNode =
if t == nil:
t = TrieNode()
if i == len(w):
t.isWord = true
else:
t.s[Letters.find(w[i])] = add(t.s[Letters.find(w[i])], w, i + 1)
result = t
proc buildTrie(s: seq[string]): TrieNode =
for word in s:
result = result.add(word)
proc search(t: TrieNode, dist: int, w: string, i = 0): Option[string] =
if i == w.len:
if t != nil and t.isWord and dist == 0:
return some("")
else:
return
if t == nil:
return
var f = t.s[Letters.find(w[i])].search(dist, w, i + 1)
if f.isSome:
return some(w[i] & f.get())
if dist == 0:
return
for j in 0 ..< 52:
f = t.s[j].search(dist - 1, w, i)
if f.isSome:
return some(Letters[j] & f.get())
f = t.s[j].search(dist - 1, w, i + 1)
if f.isSome:
return some(Letters[j] & f.get())
t.search(dist - 1, w, i + 1)
proc spellCheck(t: TrieNode, word: string): string =
assert t != nil
var dist = 0
while true:
let res = t.search(dist, word)
if res.isSome:
return res.get()
inc dist
when isMainModule:
let t = buildTrie(@["hello", "world", "hell", "word", "definitive", "definition"])
# echo t.spellCheck("hel")
echo t.spellCheck("def")
If we update old Rosetta Code (~ 1000 pages) to Nim 2.0 and develop [https://github.com/TheAlgorithms/Nim] this will improve A.I. tooling even more.
@dlesnoff
You are correct that there is more than enough to train a general AI.
The catch is that GPT is not a general AI. It does not recognize patterns and algorithms. It simply predicts words based on the previous words based on a high-volume of training data. To turn that in to useful code generation it needs not just Rosetta Code to be good, but for millions of people to have code examples subtly similar to Rosetta Code but published in other projects. It needs sheer volume.
I'm actually impressed that it is working on the small sample size it has. I suspect that it is boosted by the non-nim languages. Basically it sees nim more as a dialect of word prediction of known sequences seen in other languages.
It's a very simple example, but an example where it could be useful. I've had more luck using it to explain code or find errors in code than it coding for me. Even for basic algorithms, I was more productive consulting Rosetta Code directly than trying to make ChatGPT code work... maybe GPT4 or GHC are better.
Nim training dataset was certainly much smaller than many other languages, so it's certainly using a lot of knowledge by analogy from those other languages (it can be seen in some errors of the generated code). It's impressive how it transfers knowledge from one language to another like a polyglot. Smaller models like Alpaca 13B are terrible for niche languages like Nim, or niche natural languages like Esperanto, while ChatGPT is usable. Those smaller models can probably become a lot better with focused training, though.
And as I said in the other post, I wish the Nim development team cooperates with OpenAI in order to add nim syntax highlighting to ChatGPT.
This is not something special that the Nim team has to do, except for continuing to publish more working Nim code.
The GPT-like models are large statistical models that really only try to predict the next word when they generate text in response to a prompt. These models are trained on huge amounts of text, similar to the scale of search engine indexing. However their ability to answer questions is a useful way has become so good that they have been called AI in a very general way. They can even build up a context.
You can use GPT3.5 and GPT4, with GPT3.5 being much cheaper, through OpenAI's APIs for your own applications.
Something to keep in mind is that these models are trained once-off, and OpenAI's GPT models have reported been trained over 2 years ago. The next time they train their models Nim code generation should be much better, with certainly more published Nim code available. I have no idea when that will be as the training is an expensive task.
If you haven't tried ChatGPT yet, do yourself a favor and visit https://chat.openai.com. There is also Google's Bard, which is similar to ChatGPT, but isn't widely available yet.