[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/jp/ - Otaku Culture

Search:


View post   

>> No.45203001 [View]
File: 377 KB, 798x877, ENet.png [View same] [iqdb] [saucenao] [google]
45203001

>>45200063
Using real game frames would be ideal in theory, but the problem is annotating all the objects in them which would take ages to do manually. Normally for these kinds of datasets people hire a team to collect images and annotate them for you e.g. real-life road images. But it's unfeasible given my timeframe. The next best thing is synthetic data. I basically extracted all the sprites + masks, annotated them automatically, and stack them ontop of each other to create an arbitrary number of training frames. Generally the more you train on, the better. But due to hardware constraints (the lecturer is supposed to be able to run this) I have to limit how many to train. Resolution is the EoSD playing field (384 by 448 pixels)

This has already been done once before in this paper: "Towards an AI playing Touhou from pixels: a dataset for real-time semantic segmentation" which presents the same idea with the goal of training an AI to beat Touhou. However, they used background images which aren't in the games at all, and their models were shown to perform poorly at detecting clusters of bullets and when the background is messy. I improve on their data generation method with real backgrounds and danmaku clusters collected by modifying EoSD's files so that I can screenshot whatever I want straight from the game. The goal is train the same models with the enhanced dataset, hopefully fix those issues and improve the FPS of inference. And in the future potentially combine with reinforcement learning to train an AI that can beat Touhou from looking at pixels.

Navigation
View posts[+24][+48][+96]