“Model Preferences”
There are now many possible choices. I recommend that you initially rely on the predefined models. Therefore, select “Use pre-defined model”.
The models you can choose from here all have their individual strengths and weaknesses, which can also vary depending on the use case, such as the programming language being used. Recently, I've achieved the best results with “CodeQwen 2.5“.
The „Model Size“ indicates the number of parameters a model has, for example, 33B stands for “33 Billion” (note that in the US-English speaking world, “1 Billion” equals 1,000,000,000 or 109). It can be thought of as the “capacity of a brain defined by the number of neurons”, and generally speaking, more is better in AI. However, caution is advised because a new generation of models can be significantly better with fewer parameters. (Defining and measuring this ‘better’ is also a challenge.)
Indeed, the ‘model size’ also has a nearly linear relationship with memory consumption, both in terms
of disk space and (V)RAM usage. As a rough rule of thumb, you can consider “1B ≘ 1GB” for
initial estimations.
Quantisierung If your main (or of course, GPU) memory allows it, here definitely applies: The less quantized, or the more “bits per value”, the better!
Prompt context size: The context can be seen as the “memory” of a model. However, this context is not measured in characters, but in so-called “token” . For simplicity, you can think of them as 'syllables'.
If you want to provide longer code snippets to the LLM for processing, the “context window” must also be large enough. However, you cannot enter arbitrary values here, as there is a maximum built into the models
To find out this maximum, you can do the following:
Under the little question mark next to the model dropdown select box you will find a link to the model on Huggingface. Click it.
The “model cards” vary from model to model, and the context length is not always specified.
“Threads”: To ensure optimal performance without hindering your work, you should set the number of Threads to the amount your system can handle without significant slowdown.
Finally click „Start server“ and „OK“