CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving

CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving

Dechen Gao Department of Computer Science, University of California, Davis Shuangyu Cai Department of Electrical and Computer Engineering, University of California, Davis caisy21@mails.tsinghua.edu.cn Hanchu Zhou Department of Electrical and Computer Engineering, University of California, Davis Hang Wang Department of Electrical and Computer Engineering, University of California, Davis Iman Soltani Department of Mechanical and Aerospace Engineering, University of California, Davis Junshan Zhang Department of Electrical and Computer Engineering, University of California, Davis
Abstract

To safely navigate intricate real-world scenarios, autonomous vehicles (AVs) must be able to adapt to diverse road conditions and anticipate future events. World model based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments. Nevertheless, to the best of our knowledge, there does not exist an accessible platform for training and testing such algorithms in sophisticated driving environments. To fill this void, we introduce CarDreamer, the first open-source learning platform designed specifically for developing and evaluating world model based autonomous driving algorithms. It comprises three key components: 1) World model (WM) backbone: CarDreamer has integrated some state-of-the-art world models, which simplifies the reproduction of RL algorithms. The backbone is decoupled from the rest and communicates using the standard Gym interface, so that users can easily integrate and test their own algorithms. 2) Built-in tasks: CarDreamer offers a comprehensive set of highly configurable driving tasks which are compatible with Gym interfaces and are equipped with empirically optimized reward functions. 3) Task development suite: CarDreamer integrates a flexible task development suite to streamline the creation of driving tasks. This suite enables easy definition of traffic flows and vehicle routes, along with automatic collection of multi-modal observation data. A visualization server allows users to trace real-time agent driving videos and performance metrics through a browser. Furthermore, we conduct extensive experiments using built-in tasks to evaluate the performance and potential of WMs in autonomous driving. Thanks to the richness and flexibility of CarDreamer, we also systematically study the impact of observation modality, observability, and sharing of vehicle intentions on AV safety and efficiency. All code and documents are accessible on our GitHub page https://github.com/ucd-dare/CarDreamer.

Keywords Autonomous Driving  \cdot Reinforcement Learning  \cdot World Model

1 Introduction

Autonomous vehicles (AVs) are expected to play a central role in future mobility system with many promising benefits like safety and efficiency [1]. Recent years have witnessed great achievement on the development of AVs. In the U.S. alone, millions of miles have been driven on public roads by AVs [2]. However, achieving robust AVs that are capable of navigating complex and diverse real-world scenarios remains a challenging frontier [3, 4, 5]. For instance, as calculated by the US Department of Transportation’s Federal Highway Administration, AVs experience a crash rate about two times more than the conventional vehicles per million miles traveled [6].

The reliability of AVs directly hinges upon the generalization capability of autonomous systems in unforeseen scenarios. World model (WM), which excels in generalization, offers a promising solution with its ability to learn the complex dynamics of environments and anticipate future scenarios. In particular, WMs learn a compact latent representation that encodes the key elements and dynamics of the environment. This learned representation facilitates better generalization, allowing the WM to make predictions in scenarios beyond its training samples. Internally, WMs incorporate components that mimic human-like perception and decision-making, such as vision model and memory model [7, 8]. Indeed, humans excel at handling rare or unseen events with proper actions thanks to human’s internal world model [9]. By emulating cognitive processes akin to human intelligence, WM based reinforcement learning (RL) has demonstrated state-of-the-art performance in domains such as Atari games and Minecraft [10]. However, WMs’ application on autonomous driving remains an exciting open field [5], partially due to the lack of easy-to-use platforms to train and test such RL algorithms. The endeavor on developing a learning platform for WM-based autonomous driving can be extremely beneficial for the research in this domain.

Thus motivated, we introduce CarDreamer, the first open-source learning platform designed specifically for WM based autonomous driving. CarDreamer aims to facilitate the rapid development and evaluation of algorithms, enabling users to test their algorithms on provided tasks or quickly implement customized tasks through a comprehensive development suite. CarDreamer’s three key contributions include:

  1. 1.

    Integrated WM algorithms for reproduction. CarDreamer has integrated state-of-the-art WMs, including DreamerV2, DreamerV3, and Planning2Explore, significantly reducing the time required to reproduce the performance of existing algorithms. These algorithms are decoupled from the rest of CarDreamer and communicates through the unified Gym interface. This enables straightforward integration and testing of new algorithms without additional adaptation efforts as long as they support Gym interface.

  2. 2.

    Highly configurable built-in tasks with optimized reward. CarDreamer provides a comprehensive set of driving tasks, such as lane changing and overtaking. These tasks allow extensive customization in terms of difficulty, observability, observation modalities, and communication of vehicle intentions. They expose the same Gym interface for convenient use and the reward functions are meticulously designed to optimize training efficiency.

  3. 3.

    Task Development Suite and Visualization Server. This suite not only simplifies the creation of customized driving tasks via API-driven traffic spawning and control but also includes a modular observer for easy multi-modal data collection and configuration. A visualization server enables the real-time display of agent driving videos and statistics on a web browser, which accelerates reward engineering and algorithm development by providing immediate performance insights.

In addition to introducing the CarDreamer platform, we present comprehensive experiments that evaluate the overall performance and potential of WMs in autonomous driving. We highlight its predictive accuracy on multi-modal observation inputs. Furthermore, the comparison of different levels of observability and intention sharing demonstrates that communication can markedly enhance both traffic safety and efficiency. To the best of our knowledge, these results represent the first experimental manifestation of WMs’ efficacy in autonomous driving tasks with communication of vehicle intentions.

2 Related Work

World Models in Reinforcement Learning. RL usually suffers from low sample efficiency [11], which significantly hinders it from being practical especially for tasks where interacting with environment can be costly and time-consuming. To remedy for the issue, model-based RL leverages a world model that explicitly learns environment dynamics to “imagine” future trajectories, allowing agents to interact with the world model instead of the actual environment [7]. As the high-dimensional observations that evolve on intricate dynamics can be intractable, prior works typically learn dynamics on a latent space. Typical designs involve modeling dynamics with a Recurrent State-Space Model (RSSM) [12]. Dreamer provides a series of works that leverages RSSM [12, 13, 10] and train agents in world model’s imagination and have demonstrated promising sample efficiency and generalization ability on conventional RL benchmarks. ISO-Dream [14] isolates controllable and non-controllable sources to the changes of dynamics such that agents can differentiate changes that are independent and dependent to their actions. Planning2Explore [15] promotes exploration by directing it towards states of higher uncertainty, facilitating the learning of more robust dynamics and enabling quick adaptation to new tasks in a zero or few-shot manner. LEXA [16] utilizes a learned world model to train separate explorer and achiever policies for forward-looking exploration and goal achievement.

World Models for Autonomous Driving. In the field of autonomous driving, there has been mainly two branches of studies of world models [17, 18], 1) leveraging world models as neural driving simulator to synthesize realistic driving videos, and 2) utilize world models in simulation to train and evaluate agent policies. In the first branch, GAIA-1 [19] utilizes a world model to generate realistic driving scenarios given videos, texts, and actions as inputs. DriveDreamer [20] generates driving scenarios along with the actions given prior information such as high-definition maps and 3D bounding boxes. ADriver-1 [21] eliminates the need of extensive prior information and is able to achieve infinite driving inside its world model by providing both scenario generation and action predictions. DriveDreamer-2 [22] is built-upon large language models to make prompting more user-friendly to generate diverse traffic conditions for driving video generation. For the second branch, MILE [9] conducts imitation learning based on a Dreamer-style world model. It learns from offline expert data using road map and camera inputs, and predicts the transition of future Bird-Eye Views (BEVs) as an auxiliary task. SEM2 [23] conducts reinforcement learning with a Dreamer-style world model, and decodes camera and LiDAR representations into semantic BEV masks. Think2Drive [24] trains DreameV3 with BEV inputs and tests it on 39 CARLA benchmarks. Our platform aims at facilitating this branch of research, providing tailored benchmarks and tools for WM based RL algorithms in autonomous driving.

Simulators. Collecting data for autonomous driving in the real world is costly and time-consuming. To this end, various simulators, such as CARLA [25], SUMO [26], and Flow [27], have been developed. CARLA is distinguished by its realistic environmental modeling and image rendering capabilities. These simulators are generally designed for general traffic simulation rather than for RL applications. Our platform is specifically tailored for WM-based RL, offering RL rewards, interfaces, and automating training data requirements.

3 Background

We give a brief introduction in this section to the two cornerstones that CarDreamer involves: CARLA, [25] a high-fidelity and flexible simulator, and gym [28], a standard interface for RL training and evaluation.

CARLA. CARLA is an open-source simulator that aims at simulating real-world traffic scenarios. CARLA is based on Unreal Engine which provides realistic physics and high-quality rendering. CARLA provides digital assets including maps, buildings, vehicles, and various landmarks. It supports various sensors such as RGB camera, LiDAR, RADAR. Users can create vehicles or pedestrians and take full control of these actors. It is indeed a very general tool, but the main drawback of its application in RL algorithms also come from its generality. As is stated in Section 2, obtaining the BEV involves a cumbersome process, impeding its fast deployment in training RL algorithms.

Gym. Gym is a standard interface defined by OpenAI to formalize the communication between agents and environments. Two functions reset() and step(action) constitute the core part of this interface. The former initializes the environment to its start state. The latter takes an action input from the agents, simulates the evolution of the environment, and returns observation data, reward signals, a terminal indicator, and some extra information. In this way, an RL algorithm can be easily tested on various environments with minor adaptation as long as they both support gym interface. There have been extensive efforts in developing diverse Gym benchmarks, such as Atari games, DMC suites. However, in the field of WM based RL algorithms for autonomous driving in CARLA, CarDreamer is the first platform that provides diverse urban driving tasks through gym interface to facilitate training and evaluation.

4 CarDreamer Architecture and Implementation

Refer to caption
Figure 1: CarDreamer Architecture. Three key components are highlighted in bold italic font: Built-In Tasks, Task Development Suite, World Model Backbone.

4.1 Overview of CarDreamer

As depicted in Figure 1, CarDreamer comprises three principal components: built-in tasks, task development suite, and world model backbone. The task development suite facilitates a variety of API functionalities, including vehicle spawning, traffic flow control, and route planning within CARLA. An observer module automates the collection of multi-modal observation data, such as sensor data and BEVs, managed by independent and customizable data handlers. This data serves dual purposes: it is utilized by the task and a training visualization server. The visualization server displays real-time driving videos and environment feedback via an HTTP server and integrates seamlessly with the world model algorithm through the gym interface. Upon receiving an action as the agent’s response, the observer collects data from data handlers at the subsequent frame, thus continuing this operational cycle. We will now explore each module in detail.

4.2 Built-In Tasks

We have meticulously crafted a wide array of realistic tasks, ranging from simple skills such as lane following and left turning to more complex challenges like random roaming across mixed road conditions that include crossroads, roundabouts, and varying traffic flows. These tasks are highly configurable, offering numerous options that present fundamental questions in autonomous driving.

Observability & Intention Sharing: Partial observability presents a significant challenge in RL, where incomplete state information can exponentially increase the complexity of the input space by encompassing all historical steps [29]. To address the lack of tools tailored to these challenges in autonomous driving, we offer three observability settings in CarDreamer: 1) Field-of-View (FOV) includes only the vehicles within the camera’s FOV. 2) Shared-FOV (SFOV) enables a vehicle to communicate with and collect FOV data from other vehicles within its own FOV. 3) Full Observability (FULL) assumes complete environment and background traffic information. Furthermore, users have control over whether the vehicle shares their intention, and whom the vehicles share with. These configurations aligns with the fundamental questions of “what information to communicate” and “whom to communicate with” [30].

Observation Modality: Users can configure the observation space to include various modalities, from sensor data such as RGB cameras and LiDAR to synthetic data such as BEVs. This flexibility supports the development of end-to-end models that are capable of making decisions directly from multi-modal raw sensor data [5] or planning with BEV perception [31].

Difficulty: Difficulty settings primarily affect the density of traffic, posing significant collision avoidance challenges. As safety-critical events of AVs are rare [32], it is inherently difficult to validate the robustness of AVs due to the infrequent nature of such events [33]. CarDreamer is specifically designed to enabling a comprehensive evaluation of safety and efficiency in scenarios that mimic these infrequent but critical events.

Reward function. Each task within CarDreamer is equipped with an optimized reward function, which has been experimentally shown to enable DreamerV3 to successfully navigate through waypoints within just 10,000 training steps (see Section 5 for details). Notably, our empirical findings indicate that rewarding the agent based on its speed or incremental position changes leads to superior performance compared to rewarding absolute position. This is because when rewarded solely for position, the agent can exploit the reward function by making a small initial movement and then remaining stationary, as any further movement risks incurring collision penalties. In practice, we do observe such sub-optimal behavior, where the learned policy converges to a local optimum to avoid collision by remaining stationary. Conversely, rewarding speed forces the agent to maintain continuous motion to accumulate rewards, mitigating the risk of premature convergence to undesirable stationary policies.

Our reward design carefully addresses crucial requirements for driving tasks, such as trajectory smoothness, which are often overlooked in conventional RL algorithms. Typically, these algorithms include an entropy term in their loss function or value estimation to encourage exploration and prevent premature convergence. However, in autonomous driving contexts, this entropy term can incentivize vehicles to follow a zigzag trajectory, as such erratic motion generates higher entropy rewards compared to smoother paths, even though both trajectories might achieve similar progress towards the goal. To counteract this effect, we introduce a penalty term specifically designed to discourage motion perpendicular to the goal direction. As a result, we have developed a reward function that effectively balances goal progression and trajectory smoothness, structured as follows

r=αvparallelβvperpγ𝕀collision.𝑟𝛼subscript𝑣parallel𝛽subscript𝑣perp𝛾subscript𝕀collisionr=\alpha v_{\text{parallel}}-\beta v_{\text{perp}}-\gamma\mathbb{I}_{\text{% collision}}.italic_r = italic_α italic_v start_POSTSUBSCRIPT parallel end_POSTSUBSCRIPT - italic_β italic_v start_POSTSUBSCRIPT perp end_POSTSUBSCRIPT - italic_γ blackboard_I start_POSTSUBSCRIPT collision end_POSTSUBSCRIPT . (1)

Here, vparallelsubscript𝑣parallelv_{\text{parallel}}italic_v start_POSTSUBSCRIPT parallel end_POSTSUBSCRIPT and vperpsubscript𝑣perpv_{\text{perp}}italic_v start_POSTSUBSCRIPT perp end_POSTSUBSCRIPT represent the speed parallel and perpendicular to the goal direction, respectively. 𝕀collisionsubscript𝕀collision\mathbb{I}_{\text{collision}}blackboard_I start_POSTSUBSCRIPT collision end_POSTSUBSCRIPT is the indicator for collision. α𝛼\alphaitalic_α, β𝛽\betaitalic_β, γ𝛾\gammaitalic_γ are scaling factors. For tasks like waypoint following, additional reward terms are included for reaching each waypoint.

Interface and Usage. All built-in tasks in CarDreamer utilize a unified gym interface, allowing straightforward training and testing of RL algorithms without additional adaptations. Beyond direct usage, CarDreamer supports a variety of algorithms, including those for curriculum learning, which can leverage the progression from simpler to more complex tasks; or continual learning, which aims at addressing catastrophic forgetting when learning a new task. Additionally, for imitation learning, CarDreamer simplifies the collection of observational data in the simulator. Although initially designed for WM-based RL algorithms, the gym interface enables diverse applications across various algorithmic strategies.

4.3 Task Development Suite

For users requiring customized tasks, CarDreamer offers a highly modular Task Development Suite. This suite is adaptable to various levels of customization to satisfy diverse user requirements.

The initial module, World Manager, caters to basic needs such as varying driving scenarios with different maps, routes, spawning locations, or background traffic flows. The World Manager is responsible for managing ‘actors’, a term borrowed from CARLA [25], which encompasses all entities including vehicles, pedestrians, traffic lights, and sensors. It provides API calls to spawn various actors, particularly vehicles at different locations with either a default or a customized blueprint. These vehicles can be controlled by the user or by an autopilot, a simple rule-based autonomous driving algorithm. Upon reset, it transparently destroys and releases resources.

The second module, the Observer, automates the collection of observation data across various modalities. While it allows users to easily access pre-defined observation modalities without manual interaction, it also supports extensive customization for data specifications. This is achieved through a series of data handlers, each delivering data for a particular modality, such as a RGB camera handler and a BEV handler. Each data handler is highly modular and independently manages the entire lifecycle of a specific type of data. Users can enhance the observer by registering a new data handler tailored to their own requirements.

The third module comprises Route Planners that accommodate diverse needs for task routes. CarDreamer includes several planners: a random planner for exploratory roaming across the entire map, a fixed path planner that creates waypoints connecting user-defined locations, and a fixed ending planner that generates routes using the classical A* algorithm from the current position to a designated endpoint. For additional customization, a base class is available for users to develop their own planners by overriding the init_route() and extend_route() methods, which define the initialization and extension of routes per time step, respectively.

Additionally, the suite features a visualization server that seamlessly integrates the output from the Observer and other statistical data from environment feedback, displaying via an HTTP server. This automation facilitates rapid feedback, enhancing the process of reward engineering and algorithm development without extra coding efforts.

4.4 World Model Backbone

The World Model Backbone in CarDreamer seamlessly integrates state-of-the-art approaches such as DreamerV2 [13], DreamerV3 [10], and Planning2Explore [15], facilitating rapid reproduction of these models. This backbone architecture is strategically designed to decouple the world model implementation from task-specific components, thereby enhancing modularity and extensibility. Communication between these components is efficiently managed through the standard gym interface, which allows for extensive customization.

This decoupling enables users to easily adapt or replace the default world models with their own implementations, supporting rapid prototyping, benchmarking, and comparative analysis against established baselines. CarDreamer thus provides a comprehensive testbed for world model-based algorithms, fostering an ecosystem conducive to accelerated research and development within this field. The platform encourages users to explore innovative architectures, loss functions, and training strategies, all within a consistent and standardized evaluation framework characterized by diverse driving tasks and performance metrics.

5 CarDreamer Task Experiments

This section showcases the versatility and capabilities of CarDreamer through a comprehensive set of experiments across a wide range of settings. We use DreamerV3 [10] as the model backbone. Section 5.1 focuses on task training and evaluation, where we evaluate the performance of WMs in diverse driving tasks within CarDreamer. In Section 5.2, we assess the prediction accuracy of WMs to accurately imagine future states in different observation modality settings. Furthermore, Section 5.3 systematically evaluates the significant impact of observability and intention sharing on traffic safety and efficiency.

5.1 World Model Training & Evaluation

Refer to caption
Figure 2: Reward curves of different tasks.

We use a small DreamerV3 model with only 18M parameters Figure 4 as the model backbone. A small DreamerV3 has 32 CNN multiplier, 512 GRU and MLP units, and the MLP has only two layers within its RSSM [10]. The small memory overhead is around 10GB which allows us to train on a single NVIDIA 4090 GPU alongside running CARLA simulator.

We train the agent on each task. The change in reward curves with respect to time steps is shown in Figure 2. Simpler tasks with less traffic, such as ’right turn simple’ and ’lane merge’, typically converge within 50k steps (about 1 hour), whereas tasks involving denser, aggressive traffic flows, which require collision avoidance, take approximately 150k-200k steps to converge (about 3 to 4 hours).

In our evaluation, we employ several metrics to rigorously assess the performance of autonomous driving agents executed within the CarDreamer tasks, detailed in Table 1. These metrics include:

  • Success Rate: This metric measures the percentage of episodes in which the ego vehicle successfully completes the task by reaching a destination point or traveling a predetermined distance without incident or out of lanes.

  • Average Distance (m): Represents the average distance traveled by the ego vehicle across all episodes before the episode terminates, either through task completion or due to a failure such as a collision or timeout.

  • Collision Rate (%): Calculates the percentage of episodes where the ego vehicle is involved in a collision.

  • Average Speed (m/s): Measures the average speed maintained by the ego vehicle throughout the task. This metric is indicative of how efficiently the vehicle navigates the environment, balancing speed with safety.

  • Waypoint Distance: This metric quantifies the average divergence from the desired route waypoints. It assesses the vehicle’s ability to adhere to the planned path, reflecting its navigation accuracy and precision in following the given trajectory.

It is worth noting that several tasks, such as ‘right turn’ and ‘left turn’, are notably challenging in environments with background traffic, where traffic flows aggressively and always disregards traffic rules and signs. This behavior increases the potential for collisions with the ego vehicle. Consequently, the AV must accurately predict future maneuvering of other vehicles to successfully navigate through the task.

Refer to caption
Figure 3: Sampled images during one episode when completing different tasks.
Table 1: Performance metrics in different tasks.
Tasks Success Rate Avg. Distance (m) Collision Rate Avg. Speed (m/s) Wpt. Distance
Right turn hard 97.63%percent97.6397.63\%97.63 % 41.05 2.37 %percent\%% 3.06 0.87
Right turn medium 93.62%percent93.6293.62\%93.62 % 40.82 6.37 %percent\%% 3.25 0.85
Right turn simple 100.00%percent100.00100.00\%100.00 % 41.23 0.00 %percent\%% 2.86 0.94
Left turn hard 90.62%percent90.6290.62\%90.62 % 46.51 9.38 %percent\%% 1.49 0.86
Left turn medium 84.85%percent84.8584.85\%84.85 % 44.90 6.37 %percent\%% 3.25 0.85
Left turn simple 97.62%percent97.6297.62\%97.62 % 45.23 0.00 %percent\%% 2.00 1.28
Overtake 93.02%percent93.0293.02\%93.02 % 36.47 6.98 %percent\%% 3.11 2.01
Four lane 86.15%percent86.1586.15\%86.15 % 94.97 12.31 %percent\%% 3.13 0.91
Navigation 91.46%percent91.4691.46\%91.46 % 168.77 2.44 %percent\%% 4.25 0.88
Lane merge 89.38%percent89.3889.38\%89.38 % 95.11 6.88 %percent\%% 5.20 0.89
Roundabout 84.16%percent84.1684.16\%84.16 % 76.90 15.84 %percent\%% 3.48 1.03
Refer to caption
(a) BEV
Refer to caption
(b) camera
Refer to caption
(c) LiDAR
Figure 4: Comparison of the ground-truth observations and the ones imagined by WM in different modality settings

5.2 Predictions in Different Observation Modalities

WM’s imagination capability allows it to effectively predict future scenarios and manage potential events. To evaluate the WM’s imagination performance with observations of different modalities, we conduct the experiments on the “right turn hard” task. We choose three different modalities: BEV, camera, and LiDAR. For each one, the WM is required to imagine the observations in a few future steps given the start state and a series of actions.

The results, illustrated in Fig.4, compare the ground-truth images with the imagined ones across three modalities. The first row displays the ground truth observation images, the second row the WM’s imagined outcomes, and the third row the differences between them. We selected frames within an imagination horizon of up to 64 time steps.

The findings demonstrate the WM’s proficiency in accurately predicting the future despite the different modalities. In the BEV experiment (a), the WM precisely predicted the positions and trajectories of vehicles moving straight and making right turns, as well as the rotation and translation of the BEV with respect to the ego vehicle. Similarly, in camera and LiDAR settings, WM successfully predicts a vehicle driving in front of the ego vehicle.

5.3 Benefits of V2V Communication

A distinctive feature of CarDreamer is its ability to facilitate easy customization of the level to which vehicles communicate. Vehicles can share FOV views, leading to different observability. Besides, they can even share intentions (represented by vehicles’ planned waypoints) for better planning. We utilize this feature to evaluate the impact of communication. An agent is trained and tested on the “right turn hard” task under different settings, i.e., different obsrevability and whether it has access to others’ intentions. The “right turn hard” task is particularly suitable for testing observability and intention communication due to the dense traffic and frequent potential for collisions from vehicles outside the FOV.

The reward curves are shown in Figure 5 and some performance metrics are shown in Table 2. Note that the successful behavior in making the right turn is approximately indicated by rewards exceeding 250 in our reward functions. The results show that limited observability or lack of intention sharing impedes the agent from completing the task. The evenly sampled images during one episode (shown in Figure 6) provides a good explanation: the agent adopts a conservative and sub-optimal policy where it stops at the crossroad to avoid collision. For example, in the first three rows of Figure 6, the agent stops moving before merging into the car flow. In contrast, the complete information enables the ego vehicle to successfully execute the right turn.

Refer to caption
(a) Observability.
Refer to caption
(b) Intention sharing.
Figure 5: Reward curves in different communication settings.
Table 2: Metrics in different communication settings.
Settings Success Rate Avg. Distance (m) Collision Rate Avg. Speed (m/s) Wpt. Distance
Complete Information 97.63%percent97.6397.63\%97.63 % 41.05 2.37 %percent\%% 3.06 0.87
FOV Observability 0.00%percent\%% 21.67 96.11%percent\%% 1.82 0.81
SFOV Observability 0.00%percent\%% 23.54 100.00%percent\%% 1.99 0.83
No Intention Sharing 0.00%percent\%% 21.08 90.53%percent\%% 0.96 0.79
Refer to caption
Figure 6: Sampled images during one episode in different communication settings.

6 Conclusion

We introduced CarDreamer, an open-source learning platform tailored for the development and evaluation of WM based RL algorithms in autonomous driving. CarDreamer offers a comprehensive set of built-in tasks, a flexible task development suite, and an integrated world model backbone, all aimed at facilitating rapid prototyping of driving tasks and algorithm testing within this specialized domain. With its modular design and diverse task configurations, CarDreamer establishes itself as a flexible and challenging testbed for assessing the performance of WM based autonomous driving systems. The experiments we conduct using our platform gives a comprehensive evaluation of the performance of DreamerV3 different driving tasks. We emphasize its predictive accuracy across different observation modalities and the significant impact of communication on performance.

Looking to the future, a promising avenue for further development involves the integration of curriculum learning [34] and continual learning [35] strategies. These approaches aim to systematically enhance the learning process by gradually increasing task complexity or continuously integrating new knowledge without forgetting previously acquired information. Furthermore, exploring advanced techniques such as transfer learning [36] and meta-learning [37] could significantly improve the platform’s capabilities for few-shot adaptation to new environments. This would further augment CarDreamer’s utility in developing more generalized and robust autonomous driving approaches.

References

  • [1] Sorin Grigorescu, Bogdan Trasnea, Tiberiu Cocias, and Gigel Macesanu. A survey of deep learning techniques for autonomous driving. Journal of field robotics, 37(3):362–386, 2020.
  • [2] Matthew Schwall, Tom Daniel, Trent Victor, Francesca Favaro, and Henning Hohnhold. Waymo public road safety performance data. arXiv preprint arXiv:2011.00038, 2020.
  • [3] Ekim Yurtsever, Jacob Lambert, Alexander Carballo, and Kazuya Takeda. A survey of autonomous driving: Common practices and emerging technologies. IEEE access, 8:58443–58469, 2020.
  • [4] Siyu Teng, Xuemin Hu, Peng Deng, Bai Li, Yuchen Li, Yunfeng Ai, Dongsheng Yang, Lingxi Li, Zhe Xuanyuan, Fenghua Zhu, et al. Motion planning for autonomous driving: The state of the art and future perspectives. IEEE Transactions on Intelligent Vehicles, 2023.
  • [5] Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers. arXiv preprint arXiv:2306.16927, 2023.
  • [6] Crash rate calculations by US department of transportation federal highway administration. https://safety.fhwa.dot.gov/local_rural/training/fhwasa1109/app_c.cfm. [Accessed 07-05-2024].
  • [7] David Ha and Jürgen Schmidhuber. World models. arXiv preprint arXiv:1803.10122, 2018.
  • [8] Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
  • [9] Anthony Hu, Gianluca Corrado, Nicolas Griffiths, Zachary Murez, Corina Gurau, Hudson Yeo, Alex Kendall, Roberto Cipolla, and Jamie Shotton. Model-based imitation learning for urban driving. Advances in Neural Information Processing Systems, 35:20703–20716, 2022.
  • [10] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  • [11] Yang Yu. Towards sample efficient reinforcement learning. In IJCAI, pages 5739–5743, 2018.
  • [12] Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555–2565. PMLR, 2019.
  • [13] Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  • [14] Minting Pan, Xiangming Zhu, Yunbo Wang, and Xiaokang Yang. Iso-dream: Isolating and leveraging noncontrollable visual dynamics in world models. Advances in Neural Information Processing Systems, 35:23178–23191, 2022.
  • [15] Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. In International conference on machine learning, pages 8583–8592. PMLR, 2020.
  • [16] Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, and Deepak Pathak. Discovering and achieving goals via world models. Advances in Neural Information Processing Systems, 34:24379–24391, 2021.
  • [17] Yanchen Guan, Haicheng Liao, Zhenning Li, Guohui Zhang, and Chengzhong Xu. World models for autonomous driving: An initial survey. arXiv preprint arXiv:2403.02622, 2024.
  • [18] Zheng Zhu, Xiaofeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, et al. Is sora a world simulator? a comprehensive survey on general world models and beyond. arXiv preprint arXiv:2405.03520, 2024.
  • [19] Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080, 2023.
  • [20] Xiaofeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, and Jiwen Lu. Drivedreamer: Towards real-world-driven world models for autonomous driving. arXiv preprint arXiv:2309.09777, 2023.
  • [21] Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, and Tiancai Wang. Adriver-i: A general world model for autonomous driving. arXiv preprint arXiv:2311.13549, 2023.
  • [22] Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, and Xingang Wang. Drivedreamer-2: Llm-enhanced world models for diverse driving video generation. arXiv preprint arXiv:2403.06845, 2024.
  • [23] Zeyu Gao, Yao Mu, Ruoyan Shen, Chen Chen, Yangang Ren, Jianyu Chen, Shengbo Eben Li, Ping Luo, and Yanfeng Lu. Enhance sample efficiency and robustness of end-to-end urban autonomous driving via semantic masked world model. arXiv preprint arXiv:2210.04017, 2022.
  • [24] Qifeng Li, Xiaosong Jia, Shaobo Wang, and Junchi Yan. Think2drive: Efficient reinforcement learning by thinking in latent world model for quasi-realistic autonomous driving (in carla-v2). arXiv preprint arXiv:2402.16720, 2024.
  • [25] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
  • [26] Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun-Pang Flötteröd, Robert Hilbrich, Leonhard Lücken, Johannes Rummel, Peter Wagner, and Evamarie Wiessner. Microscopic traffic simulation using sumo. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 2575–2582, 2018.
  • [27] Cathy Wu, Abdul Rahman Kreidieh, Kanaad Parvate, Eugene Vinitsky, and Alexandre Bayen. Flow: Architecture and benchmarking for reinforcement learning in traffic control. 10 2017.
  • [28] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  • [29] Jakob Foerster, Ioannis Alexandros Assael, Nando De Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. Advances in neural information processing systems, 29, 2016.
  • [30] Changxi Zhu, Mehdi Dastani, and Shihan Wang. A survey of multi-agent reinforcement learning with communication. arXiv preprint arXiv:2203.08975, 2022.
  • [31] Dian Chen, Brady Zhou, Vladlen Koltun, and Philipp Krähenbühl. Learning by cheating. In Conference on Robot Learning, pages 66–75. PMLR, 2020.
  • [32] Shuo Feng, Haowei Sun, Xintao Yan, Haojie Zhu, Zhengxia Zou, Shengyin Shen, and Henry X Liu. Dense reinforcement learning for safety validation of autonomous vehicles. Nature, 615(7953):620–627, 2023.
  • [33] Nidhi Kalra and Susan M. Paddock. Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice, 94:182–193, 2016.
  • [34] Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009.
  • [35] Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021.
  • [36] Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. A survey of transfer learning. Journal of Big data, 3:1–40, 2016.
  • [37] Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.