P1 - Why Self-Driving Cars Can't Read Vietnamese Roads (Yet)

Picture a Wednesday morning at the Ngã Tư Sở intersection in Hanoi. Forty motorcycles jostle for position at the light, weaving between a city bus and a xe ba gác hauling steel rebar. A woman on an electric bike carries a toddler on her lap and a stack of egg cartons behind her. A street vendor pushes a glass-walled cart of bánh mì across two lanes of traffic without looking up. A cyclo carrying tourists drifts left. The light turns green and everyone surges forward simultaneously, merging in a fluid choreography that has no lanes, no right-of-way, and no rules that any traffic textbook would recognize.

Now imagine asking a self-driving car to make sense of all this.

The world's AI was trained somewhere else

Every modern self-driving system, from Tesla's Autopilot to Waymo's robotaxis, relies on deep learning models that were trained on labeled driving data. These models learn what a car looks like, how pedestrians move, where lane markings are, and what traffic signs mean by studying millions of annotated images and LiDAR scans collected from real roads.

The problem is where those roads are.

In 2024, researchers at the Technical University of Munich published the most comprehensive survey of autonomous driving datasets ever conducted. Led by Mingyu Liu and published in the IEEE Transactions on Intelligent Vehicles, the study catalogued 265 datasets used to train perception systems across the global AV industry. Their finding was stark: the overwhelming majority of training data comes from the United States, Western Europe, and a handful of cities in China and Singapore. The authors concluded that this geographic concentration causes AV systems to become overfit to environmental conditions typical of those regions, and that the bias could cause them to fail in varied or unseen settings.

The landmark datasets tell the story clearly. KITTI, which has been the benchmark for 3D object detection since 2012, was collected entirely on the streets of Karlsruhe, Germany. Cityscapes covers 50 German cities. BDD100K draws from New York, San Francisco, and other American metro areas. Waymo Open comes from Phoenix, San Francisco, and Mountain View. Argoverse was collected in Pittsburgh and Miami.

The closest any major dataset gets to Southeast Asia is nuScenes, which split its 1,000 driving scenes between Boston and Singapore. That Singapore portion, roughly 450 scenes totaling about six and a half hours of data, represents the entire contribution of Southeast Asia to the world's autonomous driving training data.

What the AI has never seen

Consider what these datasets contain. KITTI's streets feature orderly German intersections with separated bicycle lanes, clear lane markings, and traffic flowing in predictable patterns. Waymo's Phoenix data shows wide, sun-drenched suburban roads with generous lane widths and sparse pedestrian activity. Even nuScenes' Singapore portion captures a city-state known for impeccable road surfaces, strict traffic enforcement, and relatively low motorcycle density.

Now consider what Vietnam's roads actually look like.

Vietnam has over 77 million registered motorcycles, according to the National Traffic Safety Committee's 2024 count, making it one of the most motorcycle-dense countries on the planet. The World Health Organization, which convened a global motorcycle safety summit in Vietnam in November 2024, puts the figure at 74 million, representing more than 90% of all registered vehicles. By either measure, the ratio works out to roughly 770 motorbikes per thousand people.

These motorcycles don't behave like the sparse two-wheelers occasionally captured in Western datasets. On a Hanoi arterial road during rush hour, it is entirely normal to see 40 to 60 motorcycles visible in a single camera frame, moving in loose formations with constant lane changes, split-second merges, and centimeters of clearance between riders. This density creates occlusion patterns that are fundamentally different from anything in KITTI or Waymo's data. Motorcycles partially block each other from every angle. A LiDAR scan might return a cluster of points that could be three separate bikes or one bike with cargo, and distinguishing between these cases requires training data from exactly this kind of environment.

Then there are the vehicle types that simply don't exist in the global training vocabulary. A xe ba gác is a three-wheeled cargo vehicle, sometimes motorized and sometimes pedal-powered, that carries everything from construction materials to live chickens. An object detector trained on KITTI has never encountered one. It might classify it as a motorcycle, or a truck, or ignore it entirely because the shape doesn't match any learned pattern. The same goes for cyclos (three-wheeled passenger rickshaws), xe đạp điện (electric bicycles that look like scooters but are legally and dynamically different), pushcart vendors, and the endless variations of motorcycles carrying improbable loads, from plate glass to full-grown trees.

The sign language barrier

Traffic sign recognition is one of the more mature ADAS features. Most commercial systems, including those built by Mobileye, Continental, and other Tier 1 suppliers, were developed and validated against the German Traffic Sign Recognition Benchmark (GTSRB), which contains over 50,000 images of European signs.

Vietnam's traffic signs are governed by QCVN 41:2019/BGTVT, the national technical regulation issued by the Ministry of Transport. This standard defines five main sign groups (prohibition, mandatory, danger warning, directional, and supplementary) with hundreds of individual sign types, including many that are unique to Vietnam: motorcycle-only lane indicators, three-wheeler restriction signs, and Vietnamese-language supplementary panels that provide context a pictogram alone cannot convey.

Here is where a commonly repeated claim needs correcting. It is often stated that Vietnam is not a signatory to the Vienna Convention on Road Signs and Signals, the 1968 international treaty that standardizes road sign design. In fact, Vietnam acceded to the convention on August 20, 2014, and its post-2014 sign regulations reflect this. Vietnamese warning signs use red triangles and prohibition signs use red circles, broadly consistent with the Vienna framework.

But "broadly consistent" is not the same as "identical." Vietnamese signs include local variations, Vietnamese-text panels, and motorcycle-specific categories that do not appear in European or American training corpora. More practically, Vietnamese roads present dense visual clutter that degrades sign detection regardless of what the signs look like. Hanging electrical cables, advertising banners, shopfront signage, and tree canopy routinely occlude traffic signs in ways that clean German test roads do not.

The real challenge, though, may be simpler than sign design. Many Vietnamese roads, particularly in older urban areas and rural regions, have faded, damaged, or entirely absent lane markings. Sidewalks routinely function as driving lanes. Road boundaries are ambiguous where pavement meets unpaved shoulder without a curb. For a lane-keeping system trained on the crisp white and yellow lines of an American highway, these conditions represent a kind of sensory void.

When ADAS meets reality

These are not hypothetical concerns. Every VinFast electric vehicle currently shipping, from the compact VF 6 to the full-size VF 9, includes Level 2 ADAS features: adaptive cruise control, highway assist, lane keeping, automatic emergency braking, blind spot detection, and traffic sign recognition. VinFast is working with ZF of Germany for sensor hardware and has announced a partnership with Israel's Autobrains, disclosed in January 2026, to co-develop a camera-only autonomous driving architecture using seven cameras and compact AI processing, with the explicit goal of eliminating the need for expensive LiDAR and pre-built HD maps.

The scale is significant. VinFast delivered nearly 197,000 electric vehicles globally in 2025, more than double the prior year, with about 175,000 of those going to Vietnamese customers. The company has set a target of 300,000 deliveries for 2026. Every one of these vehicles has ADAS sensors generating data on Vietnamese roads.

Meanwhile, an NHTSA recall in August 2025 flagged all 6,314 US-market VinFast VF 8s for a lane keeping assist defect where the system activated unexpectedly on wide turns and applied steering force that was difficult for drivers to override. This is a problem that occurred on American roads, where lane markings are clear and traffic is orderly. Published studies quantifying how the same ADAS stack performs in Vietnamese conditions, where lane markings may not exist and the lane is shared with dozens of motorcycles, are essentially nonexistent in the public literature.

This is not a critique of VinFast specifically. Toyota Safety Sense, Honda Sensing, and Hyundai SmartSense all ship in premium trims in Vietnam, and none of these systems were designed or primarily validated for motorcycle-dominant, lane-marking-sparse traffic. The pattern repeats across the industry: systems designed for one driving environment encountering another.

A data problem, not an algorithm problem

The fundamental issue is not that self-driving algorithms are incapable of handling Vietnamese traffic. It is that they have never been given the chance to learn from it.

Indian researchers at IIIT Hyderabad recognized this same gap for their own roads and created the India Driving Dataset (IDD), which now includes roughly 95,000 images, plus a 3D variant with 12,000 LiDAR frames, specifically capturing unstructured traffic in Hyderabad and Bangalore. Their original 2019 paper demonstrated what happens when you test a model trained on orderly German streets against chaotic Indian intersections: performance drops dramatically because the class distributions, scene layouts, and road user behaviors are fundamentally different.

Vietnam has no equivalent. The largest documented Vietnamese traffic sign dataset covers approximately 100 sign classes with a few thousand images, roughly three orders of magnitude smaller than the Western benchmarks that commercial systems train on. VinAI Research, Vingroup's AI laboratory, operates an NVIDIA DGX SuperPOD and has published work on parking assistance and driver monitoring, but its data collection fleet has operated in the United States and Europe. No publicly released Vietnamese on-road perception dataset comparable to IDD or nuScenes exists.

This gap matters for a specific technical reason. Deep learning models generalize well within the distribution of their training data and poorly outside it. A model that learned "motorcycle" from seeing a few thousand examples of single riders on wide American roads will not reliably detect a cluster of three motorcycles riding abreast with a passenger sidesaddle on one, a dog sitting between the rider's feet on another, and cargo strapped to the third. These are not edge cases in Vietnam. They are Tuesday.

Who will build the data foundation?

The pattern across the global AV industry is consistent. Wherever self-driving technology has made real progress, someone first invested heavily in collecting and labeling local driving data. Waymo spent years mapping Phoenix block by block before launching its robotaxi service. Baidu's Apollo Go started in purpose-built technology zones in Chinese cities, collecting data in controlled conditions before expanding to open roads. Tesla's approach relies on billions of miles of fleet data from customer vehicles in the countries where it sells.

Southeast Asia's most complex traffic environments, the motorcycle-dense cities of Vietnam, Indonesia, and Thailand, remain outside this cycle. The data hasn't been collected. The labels haven't been created. The models haven't been trained.

This creates a strange situation. Vietnam is experiencing rapid motorization, with hundreds of thousands of ADAS-equipped vehicles entering service each year. The technology in those vehicles was developed using data from roads that look nothing like Vietnam's. And the local data that could close this gap, that could teach a perception model what a xe ba gác looks like, how a motorcycle swarm moves, and where the road ends when there are no lane markings, largely does not exist in any structured, usable form.

Someone will eventually build this data foundation for autonomous driving in Southeast Asia. The companies that do it first, capturing and annotating the millions of frames needed to train perception models for motorcycle-dominated traffic, will hold something genuinely valuable: the ground truth of roads that no one else has mapped.

The question is not whether Vietnamese roads are too chaotic for self-driving technology. The question is whether the AI industry is willing to do the hard, unglamorous work of learning to see them.