{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "hmWbGiyvNNME" }, "source": [ "# LUNAR LANDER\n", "\n", "El objetivo del juego es simple (¡pero aterrizar no lo es!): ¡aterrizar la nave espacial sana y salva en la plataforma designada! ¡Prepárate para un aterrizaje suave y heroico! 🚀🌕\n" ] }, { "cell_type": "markdown", "metadata": { "id": "OvS3IEo_Pmv8" }, "source": [ "## Reglas y Punteo\n", "En cada momento del juego, ganas o pierdes puntos (recompensa) dependiendo de cómo te vaya:\n", "\n", "**Aterrizaje y velocidad**: Ganas puntos si te acercas a la zona de aterrizaje y vas despacio. Pierdes puntos si te alejas o vas muy rápido.\n", "\n", "**Inclinación**: Pierdes puntos si la nave está muy inclinada. ¡Tienes que mantenerla lo más horizontal posible!\n", "\n", "**Patas en el suelo**: Ganas **10** puntos por cada pata que toca el suelo en la zona de aterrizaje.\n", "\n", "**Motores**: Pierdes puntos por usar los motores: un poquito por los motores laterales y más por el motor principal. ¡Hay que usarlos con cuidado!\n", "\n", "**Final del juego**: Si te estrellas, pierdes **100** puntos. Si aterrizas suavemente en la plataforma, ¡ganas **100** puntos extra!\n", "\n", "Para considerar que has tenido éxito en un intento (episodio), ¡necesitas conseguir al menos **200** puntos en total!" ] }, { "cell_type": "markdown", "metadata": { "id": "GiL_39bYC6dT" }, "source": [ "## Instalacion de librerias" ] }, { "cell_type": "markdown", "metadata": { "id": "Fg7rH6DJPneG" }, "source": [] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "id": "isgdPYBPWrSE" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Defaulting to user installation because normal site-packages is not writeable\n", "Requirement already satisfied: gymnasium[other] in /home/vscode/.local/lib/python3.10/site-packages (1.2.1)\n", "Requirement already satisfied: farama-notifications>=0.0.1 in /home/vscode/.local/lib/python3.10/site-packages (from gymnasium[other]) (0.0.4)\n", "Requirement already satisfied: typing-extensions>=4.3.0 in /home/vscode/.local/lib/python3.10/site-packages (from gymnasium[other]) (4.15.0)\n", "Requirement already satisfied: numpy>=1.21.0 in /home/vscode/.local/lib/python3.10/site-packages (from gymnasium[other]) (2.2.6)\n", "Requirement already satisfied: cloudpickle>=1.2.0 in /home/vscode/.local/lib/python3.10/site-packages (from gymnasium[other]) (3.1.1)\n", "Collecting seaborn>=0.13\n", " Downloading seaborn-0.13.2-py3-none-any.whl (294 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m294.9/294.9 KB\u001b[0m \u001b[31m822.2 kB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n", "\u001b[?25hRequirement already satisfied: matplotlib>=3.0 in /home/vscode/.local/lib/python3.10/site-packages (from gymnasium[other]) (3.10.6)\n", "Collecting moviepy>=1.0.0\n", " Downloading moviepy-2.2.1-py3-none-any.whl (129 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m129.9/129.9 KB\u001b[0m \u001b[31m5.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting opencv-python>=3.0\n", " Downloading opencv_python-4.12.0.88-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (45.9 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m45.9/45.9 MB\u001b[0m \u001b[31m24.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n", "\u001b[?25hRequirement already satisfied: pillow>=8 in /home/vscode/.local/lib/python3.10/site-packages (from matplotlib>=3.0->gymnasium[other]) (11.3.0)\n", "Requirement already satisfied: fonttools>=4.22.0 in /home/vscode/.local/lib/python3.10/site-packages (from matplotlib>=3.0->gymnasium[other]) (4.60.1)\n", "Requirement already satisfied: cycler>=0.10 in /home/vscode/.local/lib/python3.10/site-packages (from matplotlib>=3.0->gymnasium[other]) (0.12.1)\n", "Requirement already satisfied: pyparsing>=2.3.1 in /usr/lib/python3/dist-packages (from matplotlib>=3.0->gymnasium[other]) (2.4.7)\n", "Requirement already satisfied: kiwisolver>=1.3.1 in /home/vscode/.local/lib/python3.10/site-packages (from matplotlib>=3.0->gymnasium[other]) (1.4.9)\n", "Requirement already satisfied: packaging>=20.0 in /home/vscode/.local/lib/python3.10/site-packages (from matplotlib>=3.0->gymnasium[other]) (25.0)\n", "Requirement already satisfied: python-dateutil>=2.7 in /home/vscode/.local/lib/python3.10/site-packages (from matplotlib>=3.0->gymnasium[other]) (2.9.0.post0)\n", "Requirement already satisfied: contourpy>=1.0.1 in /home/vscode/.local/lib/python3.10/site-packages (from matplotlib>=3.0->gymnasium[other]) (1.3.2)\n", "Collecting proglog<=1.0.0\n", " Downloading proglog-0.1.12-py3-none-any.whl (6.3 kB)\n", "Collecting imageio_ffmpeg>=0.2.0\n", " Downloading imageio_ffmpeg-0.6.0-py3-none-manylinux2014_aarch64.whl (25.6 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m25.6/25.6 MB\u001b[0m \u001b[31m27.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n", "\u001b[?25hRequirement already satisfied: decorator<6.0,>=4.0.2 in /home/vscode/.local/lib/python3.10/site-packages (from moviepy>=1.0.0->gymnasium[other]) (5.2.1)\n", "Collecting imageio<3.0,>=2.5\n", " Downloading imageio-2.37.0-py3-none-any.whl (315 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m315.8/315.8 KB\u001b[0m \u001b[31m26.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting python-dotenv>=0.10\n", " Downloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)\n", "Requirement already satisfied: pandas>=1.2 in /home/vscode/.local/lib/python3.10/site-packages (from seaborn>=0.13->gymnasium[other]) (2.3.3)\n", "Requirement already satisfied: tzdata>=2022.7 in /home/vscode/.local/lib/python3.10/site-packages (from pandas>=1.2->seaborn>=0.13->gymnasium[other]) (2025.2)\n", "Requirement already satisfied: pytz>=2020.1 in /home/vscode/.local/lib/python3.10/site-packages (from pandas>=1.2->seaborn>=0.13->gymnasium[other]) (2025.2)\n", "Collecting tqdm\n", " Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m78.5/78.5 KB\u001b[0m \u001b[31m25.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib>=3.0->gymnasium[other]) (1.16.0)\n", "Installing collected packages: tqdm, python-dotenv, opencv-python, imageio_ffmpeg, imageio, proglog, seaborn, moviepy\n", "Successfully installed imageio-2.37.0 imageio_ffmpeg-0.6.0 moviepy-2.2.1 opencv-python-4.12.0.88 proglog-0.1.12 python-dotenv-1.1.1 seaborn-0.13.2 tqdm-4.67.1\n" ] } ], "source": [ "# Permite conectar codigo en C, C++ con Python\n", "# Requerido por box2d\n", "!pip install -q swig\n", "\n", "# Gymnasium provee entornos de simulacion, controles y califica resultado\n", "!pip install -q \"gymnasium[classic-control]\"\n", "!pip install -q gymnasium[box2d]\n", "\n", "# Para grabar y reproducir video\n", "# !pip install moviepy\n", "!pip install -q pyvirtualdisplay\n", "\n", "# Agente DQN (Deep Q-learning), al que entrenaremos\n", "!pip install -q stable-baselines3\n", "# Instalamos TensorBoard para visualizar los logs de entrenamiento\n", "!pip install -q tensorboard\n", "# Instalando libreria para grabar videos de los entrenamientos\n", "!pip install \"gymnasium[other]\"" ] }, { "cell_type": "markdown", "metadata": { "id": "3l9R8gV2C_ZJ" }, "source": [ "## Variables globales" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "502iTO5rCz_P" }, "outputs": [], "source": [ "ENV_NAME = \"LunarLander-v3\" # Nombre del entorno\n", "VIDEO_FOLDER = \"./video_prueba_de_vuelo\" # En esta carpeta se guardaran los videos del test de vuelo\n", "EPISODES = 1 # Numero de episodios a grabar en la prueba de vuelo, se tratara de seleccionar el mejor\n", "LOG_DIR = \"./tmp/dqn_lunar\" # Carpeta donde se guardarán los registros de entrenamiento (logs)" ] }, { "cell_type": "markdown", "metadata": { "id": "Ywk9QTWzEOtc" }, "source": [ "## Entrenando el modelo" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "collapsed": true, "id": "OtROtkf7gzka", "outputId": "2a034bad-4142-407d-8073-925de62cac5e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using cpu device\n", "\n", "--- INICIANDO ENTRENAMIENTO DQN por 100000 pasos ---\n", "Logging to ./tmp/dqn_lunar/DQN_3\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "----------------------------------\n", "| rollout/ | |\n", "| ep_len_mean | 153 |\n", "| ep_rew_mean | -256 |\n", "| exploration_rate | 0.05 |\n", "| time/ | |\n", "| episodes | 100 |\n", "| fps | 1983 |\n", "| time_elapsed | 7 |\n", "| total_timesteps | 15285 |\n", "| train/ | |\n", "| learning_rate | 0.0001 |\n", "| loss | 4.4 |\n", "| n_updates | 2571 |\n", "----------------------------------\n", "----------------------------------\n", "| rollout/ | |\n", "| ep_len_mean | 326 |\n", "| ep_rew_mean | -85.8 |\n", "| exploration_rate | 0.05 |\n", "| time/ | |\n", "| episodes | 200 |\n", "| fps | 1439 |\n", "| time_elapsed | 33 |\n", "| total_timesteps | 47889 |\n", "| train/ | |\n", "| learning_rate | 0.0001 |\n", "| loss | 1.73 |\n", "| n_updates | 10722 |\n", "----------------------------------\n", "----------------------------------\n", "| rollout/ | |\n", "| ep_len_mean | 367 |\n", "| ep_rew_mean | -50.4 |\n", "| exploration_rate | 0.05 |\n", "| time/ | |\n", "| episodes | 300 |\n", "| fps | 1384 |\n", "| time_elapsed | 61 |\n", "| total_timesteps | 84617 |\n", "| train/ | |\n", "| learning_rate | 0.0001 |\n", "| loss | 1.17 |\n", "| n_updates | 19904 |\n", "----------------------------------\n", "\n", "--- ENTRENAMIENTO FINALIZADO. Modelo entrenado guardado. ---\n" ] } ], "source": [ "# ==============================================================================\n", "# ENTRENAMIENTO DE UN AGENTE DQN (Stable-Baselines3)\n", "# ==============================================================================\n", "\n", "\n", "# Gymnasium provee el entorno, controles y evalua el resultado\n", "import gymnasium as gym\n", "from gymnasium.wrappers import RecordVideo\n", "import os\n", "# import moviepy.editor as mp # Importamos MoviePy\n", "\n", "\n", "# Agente DQN, al que entrenaremos\n", "from stable_baselines3 import DQN\n", "from stable_baselines3.common.env_util import make_vec_env\n", "from stable_baselines3.common.monitor import Monitor\n", "\n", "\n", "# --- Preparación para el entrenamiento ---\n", "# La grabación de video solo debe hacerse después del entrenamiento o en un ambiente separado.\n", "# Para entrenar, usaremos una versión simple del ambiente sin el wrapper de video.\n", "\n", "os.makedirs(LOG_DIR, exist_ok=True)\n", "\n", "# Crear el ambiente para el entrenamiento (usando Monitor para guardar logs)\n", "env_train = gym.make(\n", " ENV_NAME,\n", " continuous=False,\n", " gravity=-10,\n", " enable_wind=False,\n", " wind_power=15.0,\n", " turbulence_power=1.5\n", ")\n", "env_train = Monitor(env_train, LOG_DIR)\n", "\n", "# Stable-Baselines3 funciona mejor con entornos vectorizados\n", "env_train_vec = make_vec_env(lambda: env_train, n_envs=1)\n", "\n", "\n", "# --- Creación del Modelo DQN ---\n", "# DQN es un algoritmo de Q-Learning profundo, ideal para ambientes discretos (como LunarLander-v3)\n", "model = DQN(\n", " \"MlpPolicy\", # Tipo de red neuronal (Multi-layer perceptron)\n", " env_train_vec, # El ambiente de entrenamiento\n", " learning_rate=0.0001, # Tasa de aprendizaje (0.00001 y 0.001)\n", " buffer_size=10000, # (10000 - 50000)\n", " learning_starts=5000,# (1000 - 10000)\n", " batch_size=64, # Puede ser [32, 64, 128]\n", " gamma=0.99, # Factor de descuento (0.90 - 0.99) menor=quiero recompensas rapido, mayor=espera recompensas mayores siendo mas cuidadoso\n", " verbose=1, # Mostrar el progreso del entrenamiento\n", " tensorboard_log=LOG_DIR\n", ")\n", "\n", "# --- Bucle de Aprendizaje ---\n", "# El método .learn() es el núcleo del entrenamiento de RL.\n", "# Entrenaremos por (50,000 - 200,000) pasos (timesteps). Esto tomará unos minutos en Colab.\n", "TIMESTEPS = 100_000\n", "print(f\"\\n--- INICIANDO ENTRENAMIENTO DQN por {TIMESTEPS} pasos ---\")\n", "\n", "# Entrenar!!\n", "model.learn(\n", " total_timesteps=TIMESTEPS,\n", " log_interval=100\n", ")\n", "\n", "print(\"\\n--- ENTRENAMIENTO FINALIZADO. Modelo entrenado guardado. ---\")\n", "model.save(\"modelo_nave_entrenada\") # Guarda el modelo entrenado\n", "env_train.close()" ] }, { "cell_type": "markdown", "metadata": { "id": "nq6619RPJBFb" }, "source": [ "## Prueba de Vuelo" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "jL7FmdKqhZaA", "outputId": "faeee27a-cb52-472a-cde7-47828f328b1f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Configurando Pantalla Virtual ---\n", "Advertencia al iniciar pyvirtualdisplay: [Errno 2] No such file or directory: 'Xvfb'. Continuaremos.\n", "Grabando 1 episodio(s) en la carpeta: ./video_prueba_de_vuelo\n", "Wrapping the env with a `Monitor` wrapper\n", "Wrapping the env in a DummyVecEnv.\n", "\n", "--- Grabación del video finalizada. ---\n" ] } ], "source": [ "# ==============================================================================\n", "# 4. PRUEBA DE VUELO Y GRABAR EL VIDEO\n", "# ==============================================================================\n", "from IPython.display import HTML\n", "from base64 import b64encode\n", "import glob\n", "import io\n", "from pyvirtualdisplay import Display\n", "\n", "# Google collab tiene dependencias core deprecadas\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "# 1. Configurar la Pantalla Virtual (Necesario para Colab/Jupyter sin GUI)\n", "print(\"\\n--- Configurando Pantalla Virtual ---\")\n", "try:\n", " display = Display(visible=0, size=(640, 480))\n", " display.start()\n", " print(\"Pantalla virtual iniciada.\")\n", "except Exception as e:\n", " print(f\"Advertencia al iniciar pyvirtualdisplay: {e}. Continuaremos.\")\n", "\n", "# 2. Crear un nuevo ambiente con el wrapper RecordVideo\n", "# Creamos la carpeta de video si no existe\n", "os.makedirs(VIDEO_FOLDER, exist_ok=True)\n", "print(f\"Grabando {EPISODES} episodio(s) en la carpeta: {VIDEO_FOLDER}\")\n", "\n", "# Creamos el ambiente de test con el wrapper de video\n", "env_test = gym.make(\n", " ENV_NAME,\n", " continuous=False,\n", " gravity=-10,\n", " enable_wind=False,\n", " wind_power=15.0,\n", " turbulence_power=0.1,\n", " render_mode=\"rgb_array\"\n", ")\n", "# El wrapper de RecordVideo debe ser el que envuelve al ambiente base\n", "env_test_video = RecordVideo(\n", " env_test,\n", " video_folder=VIDEO_FOLDER,\n", " episode_trigger=lambda x: x == 0, # Graba solo el primer episodio\n", " name_prefix=\"prueba_de_vuelo\"\n", ")\n", "\n", "# 3. Cargar el modelo entrenado y ejecutar un episodio\n", "# Cargamos el modelo que acabamos de entrenar y guardar\n", "model = DQN.load(\"modelo_nave_entrenada\", env=env_test_video)\n", "\n", "obs, info = env_test_video.reset()\n", "done = False\n", "truncated = False\n", "while not (done or truncated):\n", " # El modelo determina la acción\n", " action, _ = model.predict(obs, deterministic=True)\n", " # Ejecutamos la acción\n", " obs, reward, done, truncated, info = env_test_video.step(action)\n", "\n", "env_test_video.close()\n", "print(\"\\n--- Grabación del video finalizada. ---\")" ] }, { "cell_type": "markdown", "metadata": { "id": "ZYI_XlxDJJU4" }, "source": [ "## Reproducir Video de la prueba" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 485 }, "id": "VAqKRsjN2gb-", "outputId": "7b7f43cc-af29-4f29-edcc-34345da691db" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ Se encontraron 1 videos para reproducir.\n", "Mostrando: ./video_prueba_de_vuelo/prueba_de_vuelo-episode-0.mp4\n" ] }, { "data": { "text/html": [ "\n", " \n", "
--------------------------------------------------
\n", " " ], "text/plain": [ "--------------------------------------------------
\n", " \"\"\"\n", " display(HTML(html_tag))\n", "\n", " except Exception as e:\n", " print(f\"❌ ERROR al procesar o mostrar el video {video_path}: {e}\")\n", " print(\"Esto podría ser por un archivo muy grande.\")\n", "\n", "\n", "# 2. Buscar todos los archivos .mp4 en la carpeta\n", "# Ordenamos por fecha de creación para verlos en orden de grabación\n", "list_of_files = sorted(\n", " glob.glob(os.path.join(VIDEO_FOLDER, \"*.mp4\")),\n", " key=os.path.getctime\n", ")\n", "\n", "# 3. Iterar y mostrar cada video\n", "if list_of_files:\n", " print(f\"✅ Se encontraron {len(list_of_files)} videos para reproducir.\")\n", " for video_file in list_of_files:\n", " display_encoded_video(video_file)\n", "else:\n", " print(f\"❌ No se encontró ningún archivo de video MP4 en {VIDEO_FOLDER}.\")" ] }, { "cell_type": "markdown", "metadata": { "id": "za-H_n-3JRBd" }, "source": [ "## Puntaje de la prueba" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 211 }, "id": "uabPapxG2_nO", "outputId": "bc9ac42e-9bf3-4653-c4e9-bb6c063c6d2a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reward (Recompensa): 2.16\n", "Done (Logro Completar?): False\n", "Truncated (Tuvo que interrumpirse?): True\n", "Info (Información): {}\n" ] } ], "source": [ "# ----------------------------------------------------------------------\n", "# CALIFICACION DEL ENTRENAMIENTO\n", "# ----------------------------------------------------------------------\n", "\n", "# Asume que estas variables ya han sido actualizadas por env_test_video.step()\n", "# reward, done, truncated, info\n", "\n", "\n", "# Imprimir cada variable en una línea separada\n", "print(f\"Reward (Recompensa): {reward:.2f}\")\n", "print(f\"Done (Logro Completar?): {done}\")\n", "print(f\"Truncated (Tuvo que interrumpirse?): {truncated}\")\n", "print(f\"Info (Información): {info}\")" ] } ], "metadata": { "colab": { "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 0 }