Configure the plugin CodeGPT

Screenshot of the configuration dialog

  1. Open the dialog “Settings” (File->Settings or CTRL-ALT-S)
  2. On the left side choose “Tools” -> “CodeGPT” -> “Providers” -> “LLaMa C/C++ (Local)”
  3. You can select “Enable code completion” but be aware that this leads to some ‘base system load’ which could slow down your system noticeably.
  4. Under “Server Preferences” select “Run local server”
  5. “Model Preferences”

    There are now many possible choices. I recommend that you initially rely on the predefined models. Therefore, select “Use pre-defined model”.

    1. The models you can choose from here all have their individual strengths and weaknesses, which can also vary depending on the use case, such as the programming language being used. Recently, I've achieved the best results with “CodeQwen 2.5“.

    2. The „Model Size“ indicates the number of parameters a model has, for example, 33B stands for “33 Billion” (note that in the US-English speaking world, “1 Billion” equals 1,000,000,000 or 109). It can be thought of as the “capacity of a brain defined by the number of neurons”, and generally speaking, more is better in AI. However, caution is advised because a new generation of models can be significantly better with fewer parameters. (Defining and measuring this ‘better’ is also a challenge.)

      Indeed, the ‘model size’ also has a nearly linear relationship with memory consumption, both in terms of disk space and (V)RAM usage. As a rough rule of thumb, you can consider “1B ≘ 1GB” for
      initial estimations.

    3. Quantisierung If your main (or of course, GPU) memory allows it, here definitely applies: The less quantized, or the more “bits per value”, the better!

    4. Prompt context size: The context can be seen as the “memory” of a model. However, this context is not measured in characters, but in so-called “token” . For simplicity, you can think of them as 'syllables'.

      If you want to provide longer code snippets to the LLM for processing, the “context window” must also be large enough. However, you cannot enter arbitrary values here, as there is a maximum built into the models

      To find out this maximum, you can do the following:

      • Under the little question mark next to the model dropdown select box you will find a link to the model on Huggingface. Click it.

      • Screenshot from Huggingface

        Screenshot from Huggingface

      The “model cards” vary from model to model, and the context length is not always specified.

    5. “Threads”: To ensure optimal performance without hindering your work, you should set the number of Threads to the amount your system can handle without significant slowdown.

    6. Finally click „Start server“ and „OK“