Not logged in
Talk
Contributions
Create account
Log in
Navigation
Main page
About
People
Publications
Teaching
Resources
Research Blog
Wiki Functions
Recent changes
Help
Licensing
Project page
Discussion
Edit
View history
Editing
CommunityData:TACC
(section)
From CommunityData
Jump to:
navigation
,
search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Using the GPUs on Stampede3 == The H100 nodes on Stampede3 have 4 Nvidia H100 gpus. That's pretty impressive, and it's possible to use 4 nodes at once (this might be useful if you're fine-tuning an LLM or want to run a really really big one). Actually getting them working takes a little bit of doing. Here is a list of steps that ought to work for setting up [https://docs.vllm.ai/ <code>vllm</code>], a good system to use for running large language models. === Setting environment variables for <code>uv</code> and <code>vllm</code> === Add the following to your <code>.bashrc</code>: <code> export UV_CACHE_DIR=$SCRATCH/.cache export TRANSFORMERS_CACHE=$SCRATCH/transformers_cache export HF_HOME=$TRANSFORMERS_CACHE export XDG_CACHE_HOME=$SCRATCH/.cache </code> Then reload: <code> exec bash </code> === Install `uv` in your user site-packages === {{note}} Fortunately, TACC has recently installed python 3.12, so it isn't necessary to use a container anymore. Containers are still sometimes needed to install other software, including some python packages, but they add some complexity. If you're just starting out working with HPC, its fine not to use a container at first. You can just use <code>uv</code>. [https://docs.astral.sh/uv/getting-started/ uv] is a modern package management system for python. You can use the older `pip` system to install it into your own virtual environment. <code>pip3 install -U uv</code> To be able to run uv just by typing <code>u v [space] enter</code>, use an alias. Create a file <code>~/.bash_aliases</code> and add the line <code>alias uv="python3 -m uv"</code>. You can do this just by running: <code>echo alias uv=\"python3 -m uv\" >> ~/.bash_aliases</code> Now, restart your terminal session by typing <code>exec bash</code>. === Create a python virtual environment for your project === {{note}} A good workflow is to develop code locally and then test it on the HPC. This helps because: (1) You can use your favored editor locally instead of working with limited tools like Jupyter, terminal editors, or working with a GUI over the network. (2) It uses resources to use HPC nodes. (3) Particularly with the H100 nodes on TACC, you might have to wait (sometimes over a day) for an available node. <code>uv</code> is useful because it creates a `pyproject.toml` and `uv.lock` file. If you check these into git and sync them to the HPC, `uv` will make sure that the Python project on your laptop and the HPC are using the same package versions. {{note}} Given the filesystem situation described above, you will normally work with large data objects on the <code>$SCRATCH</code> filesystem, and copy your datasets and results to <code>$CORRAL</code>. Make a new directory (i.e., using <code>mkdir</code>) to use for the following steps. You can create a virtual environment using <code>uv</code> with this command: <code>uv init</code> Recommend: add `uv` as a dev package to the virtual environment and then sourcing it. <code>uv add --dev uv</code> <code>source .venv/bin/activate</code>. === Install <code>vllm</code> into your virtual environment === # Get an H100 node by running <code>idev -p h100 -t 48:00:00</code> # Run <code> module load gcc/13.2.0; module load cuda </code> to load the nvidia module which puts the Nvidia cuda compilers (<code>nvc</code> and <code>nvcc</code>) on your <code>$PATH</code>. # Navigate to the the directory where you created your virtual environment and install vllm by running <code>uv pip install vllm[flashinfer] --torch-backend=auto --no-cache</code> and then. <!--- # Run <code> module load gcc/14.2.0 </code> to load a modern version of the C compiler <code>gcc</code>. ---> # Test your installation by seeing if you can run the <code>gpt-oss-20B</code> model in vllm. <code>uv run vllm serve openai/gpt-oss-20b --async-scheduling</code> <!--- ==== Step 3 Explanation ==== The vllm installation statically links the Nvidia cuda compilers (<code>nvc</code> and <code>nvcc</code>). This means that you don't need these on your <code>$PATH</code> after the installation. However, the default <code>gcc</code> on Stampede 3 is out of date. It doesn't support features that vllm uses when it tries to compile cuda graphs, causing an error. By following the above steps the cuda compilers will be installed with <code>vllm</code> but running with a version of gcc that works. These issues seemed like it might have been related to conflicts between the container and TACC environmen, but actually it is not.> === Load GCC/14.2.0 by default === You might prefer to load gcc 14 only when you need it. But if that gets annoying you can load it automatically. Just add <code>module load gcc/14.2.0</code> to your [https://unix.stackexchange.com/questions/129143/what-is-the-purpose-of-bashrc-and-how-does-it-work .bashrc]. --->
Summary:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see
CommunityData:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Tools
What links here
Related changes
Special pages
Page information