Well, the entire process basically goes something like
1. Identify where to get the 10 mins + worth of vocal material from to train the model (in this case it was Clayman)
2. Use Ultimate Vocal Remover to create the vocal stems from all of the Clayman songs
3. Use Adobe Audition 3 to remove any clean/mumbling vocals and just leave the screams/growls in the vocal stems
4. Use RVC-beta (AI voice cloning tool) to train a model based on the vocal stems. This took about 30 mins with an I7-8750H processor and an RTX 2070 graphics card. You can train the model up to 200 epochs, but I limited it to 40 as it was killing both my CPU and GPU on temperature.
5. Use Ultimate Vocal Remover to create the vocal stems from the DotL tracks, as well as the instrumental stem.
6. Use RVC-beta to clone Anders' voice to the selected DotL track, this only takes a minute or two. The only setting I really changed from default was to take into account Mikael's vocals are of a lower register compared to Anders' Clayman vocals, and there's a setting for that.
7. Import the selected DotL instrumental and vocal into Adobe Audition 3, alongside the Anders' cloned vocal track.
8. Edit the Anders cloned vocal track in Adobe Audition 3 to account for any inconsistencies in volume, and mute any sound artefacts from the non-vocal sections of the vocal track. There was also the occasional need to rearrange parts of the vocal track where the cloning wasn't so good, or using Mikael's vocal track as backing vocal where there was a particularly weak section of the cloned track.
9. Save the mix to an mp3.
And that's pretty much the whole process. Steps 1-5 obviously only need to be done once, so the process becomes a lot quicker once you have the model trained and all of the instrumental/vocal stems created. Sometimes step 8 can take a while depending on how well the cloning process worked. The majority of the cloned vocals came out 90% fine, so only some minor adjustmenets needed, but there were a couple of tracks (Gateways and IBT) where it just doesn't work very well. The tracks that worked best are the ones with standard growling across the board. Overall Anders and Mikael seem like a pretty good match for cloning vocals (I assume it'd be quite good in reverse, too). I also left Mikael's cleans in a couple of songs as the model doesn'twork when putting growls over cleans, but in theory you could create a model for Anders' clean vocals to clone over Mikael's cleans... but really, why do that?
Overall results might be even better if the model was trained to higher epochs, but 30 mins at nonstop 96c for my CPU was as far as I was willing to go.Also possible that you don't really see any improvement past 40 epochs. To improve the model I'd be more inclined to take the vocal stems from Colony and use those to supplement the training model alongside the Clayman vocals. Maybe I'll do that at some point.
If you have a beefy enough CPU/GPU and want to give it a go, I used the below YouTube video to install and navigate RVC-beta:
You can find UVR via google if you don't already have it, it's open source & free. RVC-beta actually does have a vocal stem separation tool you can use, although I'm not sure how good it is. I used Adobe Audition 3 for editing but you can use any music editing software for that job.
It's not really a very complicated process at all overall, the first five steps are just a bit time-consuming. Once you've done those, though, then the rest is fairly straightforward depending on how well the vocals cloned.