Install the "oobabooga text generation web UI" as the main server for downloading and running the previously mentioned LLM.
Set the model decoder to "llama.cpp" and cache-type to "q4".
Make sure that the openai flag is activated!
SillyTavern stuff:
Set the text completion preset to default, with "streaming" option checked, but make sure that the number of context tokens mirrors the same number as the one in web UI (albeit, do bump the response tokens all the way as it helps with the "summarize" command).
For context and instruct templates, choose "ChatML-Names" for both (check "trim incomplete sentences" also).
For system prompt, choose "Roleplay-Immersive" (NB. for best results, be sure to use quotation marks for your dialogs and asterisks for your actions!).