WEBVTT - Musical Analysis at Moogfest 0:00:04.519 --> 0:00:12.760 Technology with tech Stuff from stuff works dot com. Hey there, 0:00:12.800 --> 0:00:17.080 and welcome to tech Stuff. I'm your host, Jonathan Strickland. 0:00:17.200 --> 0:00:20.279 I'm a senior writer with how stuff works dot com. 0:00:20.360 --> 0:00:23.360 I talk about all things tech and today we're gonna 0:00:23.400 --> 0:00:26.840 get a little musical with things and get a little 0:00:26.880 --> 0:00:31.920 help from our buddy Noel. Noel, who is the producer extraordinary. 0:00:32.000 --> 0:00:36.440 He's the head of of of podcast production here at 0:00:36.440 --> 0:00:39.159 how stuff Works, also one of the co hosts of 0:00:39.200 --> 0:00:41.800 Stuff they Don't Want You to Know. Noel went to 0:00:41.880 --> 0:00:45.599 mog Fest in and and got the chance to talk 0:00:45.600 --> 0:00:48.600 to a whole bunch of really cool people, including Alexander 0:00:48.720 --> 0:00:51.960 Lurch and we'll hear more about that a little bit 0:00:52.000 --> 0:00:56.920 later in this podcast. Mog Fest ostensibly is about music 0:00:56.960 --> 0:01:00.600 and technology, but it actually involves lot lots of other 0:01:00.680 --> 0:01:05.080 stuff to not just not just those two already broad fields, 0:01:05.120 --> 0:01:09.680 but other ones as well, including elements of philosophy and 0:01:09.680 --> 0:01:13.160 and even particle physics. Will have an episode in the 0:01:13.200 --> 0:01:18.200 near future that will include some elements from uh interviews 0:01:18.240 --> 0:01:21.360 we had with folks from the Large Hadron Collider. So 0:01:21.480 --> 0:01:24.800 mog Fest has all sorts of really smart, talented people 0:01:24.840 --> 0:01:30.320 getting together and having these incredible symposia and and and performances, 0:01:30.360 --> 0:01:32.880 And so Noel was able to go and talk with 0:01:32.920 --> 0:01:36.880 someone about some really cool stuff, and that kind of 0:01:36.880 --> 0:01:40.320 ties into what I wanted to chat about today. You know, 0:01:40.360 --> 0:01:42.920 once upon a time here at How Stuff Works, we 0:01:42.959 --> 0:01:45.520 had a show called Stuff from the B Side, and 0:01:45.640 --> 0:01:49.360 this was a podcast all about music. Episodes focused on 0:01:49.800 --> 0:01:53.760 everything musical, including elements that are more general concepts or 0:01:53.800 --> 0:01:58.080 philosophical ideas. And music and technology are two things that 0:01:58.200 --> 0:02:03.080 really do closely tied together. After all, almost every musical 0:02:03.160 --> 0:02:07.600 instrument is some form of technology, ranging from the relatively 0:02:07.720 --> 0:02:11.720 primitive versions of certain percussive instruments all the way up 0:02:11.760 --> 0:02:14.840 to high tech digital rigs. So I thought it might 0:02:14.840 --> 0:02:16.720 be cool to revisit music and tech and look at 0:02:16.720 --> 0:02:22.400 a particular subset of it, musical analysis and music generation. Now, 0:02:22.480 --> 0:02:26.520 music analysis and technology are also related in that we 0:02:26.639 --> 0:02:30.600 now have various automated recommendation engines that will suggest music 0:02:30.680 --> 0:02:33.320 for us to listen to based upon what we've already 0:02:33.320 --> 0:02:36.400 said we enjoy. Now these engines look for new pieces 0:02:36.440 --> 0:02:39.560 of music that in some way match criteria we seem 0:02:39.639 --> 0:02:43.400 to find appealing. We have indicated to that service that 0:02:43.480 --> 0:02:46.079 we like that particular type of music, so it starts 0:02:46.080 --> 0:02:48.720 to try and find matches that kind of follow in 0:02:48.760 --> 0:02:51.600 the same lines. As they become more adept at figuring 0:02:51.600 --> 0:02:54.600 out what qualities we really enjoy, they can hone in 0:02:54.639 --> 0:02:57.400 on songs that appeal to us, perhaps even changing them 0:02:57.520 --> 0:03:00.080 up based upon other criterias, which is the time of 0:03:00.160 --> 0:03:03.079 day or an activity. We're doing so, for example, with 0:03:03.200 --> 0:03:06.920 Google Music, and this show is not sponsored by Google 0:03:07.000 --> 0:03:10.280 Music or anything of that nature, but it will detect 0:03:10.360 --> 0:03:14.160 if I'm on my way somewhere. It might suggest music 0:03:14.240 --> 0:03:17.480 that would be conducive to a trip, or if it 0:03:17.560 --> 0:03:20.840 knows that I'm at the gym, it may suggest music 0:03:20.919 --> 0:03:23.799 that's good for keeping my heart rate up, stuff like that. 0:03:24.600 --> 0:03:28.639 So we'll just imagine a hypothetical situation. I've just woken 0:03:28.720 --> 0:03:31.640 up and the recommendation engine might find some peppy music 0:03:31.680 --> 0:03:34.800 to get me on my way. So Google Music is saying, hey, 0:03:34.840 --> 0:03:37.440 it's Monday morning, you need all the help you can get. 0:03:37.600 --> 0:03:40.200 Here's a radio station based off the song Walking on 0:03:40.320 --> 0:03:43.640 Sunshine by Katrina and the Waves. And then my phone 0:03:43.680 --> 0:03:46.280 detects that I'm going to the gem, so then the 0:03:46.360 --> 0:03:49.840 music engine switches to the song's meant to keep me 0:03:49.880 --> 0:03:51.960 moving at a particular pace while I desperately try to 0:03:52.000 --> 0:03:55.760 find the exit to the gym. I'm sorry, I'm uh 0:03:55.960 --> 0:03:59.160 to actually work out. So in that case, it's probably 0:03:59.600 --> 0:04:03.200 you know, something with a nice driving beat a good 0:04:03.240 --> 0:04:07.040 tempo to it. These are basic things that music engines 0:04:07.080 --> 0:04:08.920 can do now, but the reason they can do them 0:04:08.960 --> 0:04:12.720 at all is because of music analysis. This isn't always 0:04:12.800 --> 0:04:16.560 done in an automated fashion. In fact, automating music analysis 0:04:16.680 --> 0:04:19.919 is pretty tricky. Sometimes it relies instead on just a 0:04:20.000 --> 0:04:23.440 lot of work, and that's work done by real, live, 0:04:23.720 --> 0:04:29.280 human beings. So let's take the Music Genome Project for example. 0:04:29.520 --> 0:04:33.160 This is the database that the internet radio service Pandora 0:04:33.320 --> 0:04:36.040 relies upon when it creates a radio station based off 0:04:36.080 --> 0:04:39.320 an artist or a song that you've submitted as the 0:04:39.440 --> 0:04:43.200 seed for a new channel. For more than ten years, 0:04:43.240 --> 0:04:47.640 Pandora's staff have analyzed and categorized music, breaking down songs 0:04:47.880 --> 0:04:51.400 into all the basic components, which they call genes. These 0:04:51.440 --> 0:04:56.440 are the elements that make songs what they are. And 0:04:56.960 --> 0:05:00.799 I find this approach both fascinating and and a little odd, 0:05:01.720 --> 0:05:04.719 because in a way, it seems a little weird to 0:05:04.760 --> 0:05:09.000 take a really awesome song. Let's say it's um Blue 0:05:09.000 --> 0:05:12.279 Oyster Cults, Don't Fear the Reaper, one of the best 0:05:12.279 --> 0:05:15.320 songs ever written. And then you have to sift it 0:05:15.400 --> 0:05:19.320 down to all those little basic components, those genes that 0:05:19.400 --> 0:05:22.760 make up that song. It also reinforces this notion that 0:05:22.800 --> 0:05:25.520 a song is more than just the sum of all 0:05:25.600 --> 0:05:28.680 its parts. If you were to look at those components 0:05:28.680 --> 0:05:31.240 and attempt to make a song that included all of them, 0:05:31.279 --> 0:05:33.400 I bet it wouldn't be half as awesome as Don't 0:05:33.400 --> 0:05:37.880 Fear the Reaper. So you take a song, you identify 0:05:38.000 --> 0:05:41.200 all these different qualities of it, and may involve things 0:05:41.320 --> 0:05:46.120 like the tempo of the song, the the the structure 0:05:46.160 --> 0:05:49.040 of it, as far as versus and choruses are concerned, 0:05:49.640 --> 0:05:52.520 the whether what kind of vocalists there are, what kind 0:05:52.520 --> 0:05:57.040 of instruments are used, all of these different individual, tiny 0:05:57.120 --> 0:06:00.680 components of the song, and you put them into say spreadsheet, 0:06:01.120 --> 0:06:06.719 and that represents the collection of genes that are possessed 0:06:06.800 --> 0:06:10.040 by Don't Fear the Reaper. You take that same collection, 0:06:10.160 --> 0:06:11.960 you give them to a musician and say, I want 0:06:12.000 --> 0:06:14.200 you to write me a song that has all of 0:06:14.240 --> 0:06:18.600 these components in it. Well, again, probably not gonna get 0:06:18.640 --> 0:06:21.039 Don't Fear the Reaper. You'll get something, and maybe it 0:06:21.040 --> 0:06:23.159 will be good. Maybe it'll even be better than Don't 0:06:23.160 --> 0:06:27.960 Fear the Reaper. I doubt it, but yeah, there's there's 0:06:27.960 --> 0:06:35.080 something magical or apparently magical about music that transcends the 0:06:35.200 --> 0:06:40.400 quantitative elements that we can list now. Pandora's Music Genome 0:06:40.480 --> 0:06:45.400 project identifies four hundred fifty different musical attributes or genes. 0:06:45.880 --> 0:06:49.039 They include lots of different types of data. Some of 0:06:49.080 --> 0:06:51.719 them are relatively straightforward, such as does the song have 0:06:51.800 --> 0:06:54.480 a vocalist? If it does have a vocalist, is it 0:06:54.520 --> 0:06:58.040 a male vocalist or a female vocalist? Are there multiple vocalists? 0:06:58.640 --> 0:07:01.360 Then starts getting way more granular. So if a song 0:07:01.400 --> 0:07:04.720 has electric guitar, for example, there might be a subset 0:07:04.800 --> 0:07:08.279 of information about that, such as how much distortion is 0:07:08.360 --> 0:07:10.800 on that guitar? Does it have a lot of distortion 0:07:10.840 --> 0:07:13.520 in this song or not a lot? And so you 0:07:13.600 --> 0:07:17.920 start to subdivide down the line. Same thing is true 0:07:17.920 --> 0:07:20.880 for other instruments as well. Now, not all songs have 0:07:21.160 --> 0:07:24.320 the same number of genes, meaning some genres of music 0:07:24.320 --> 0:07:28.000 are actually easier to describe with a fewer terms than others. 0:07:28.480 --> 0:07:32.720 For example, rock songs have about one fifty genes. You 0:07:32.720 --> 0:07:34.920 can break down your rock song into about a hundred 0:07:34.960 --> 0:07:39.280 fifty different little individual components. Rap songs are more like 0:07:39.360 --> 0:07:42.440 three d fifty. So that indicates that there are gradations 0:07:42.440 --> 0:07:47.560 and variations between different songs within the same genre. Uh So, 0:07:47.840 --> 0:07:51.119 to make a recommendation engine, you first have to put 0:07:51.160 --> 0:07:55.920 all the music within the library. Through this process, you 0:07:55.960 --> 0:07:59.120 need to identify the important qualities that make the music 0:07:59.280 --> 0:08:01.760 what it is is. And you could use something like 0:08:01.800 --> 0:08:03.880 a spreadsheet and you lay it all out, and then 0:08:03.920 --> 0:08:06.400 when someone wants to make a new radio station off 0:08:06.440 --> 0:08:09.880 of a song, you can use that song's genome all 0:08:09.920 --> 0:08:13.680 the jenes listed for that specific song to guide a 0:08:13.760 --> 0:08:17.120 decision engine to pick other songs that are similar to 0:08:17.160 --> 0:08:20.440 the first one within a certain degree. So you could 0:08:20.480 --> 0:08:23.920 set this dynamically in your search engine. Right Like, let's 0:08:23.920 --> 0:08:26.880 say that you are the one designing the new, latest 0:08:26.920 --> 0:08:30.720 and greatest version of Pandora, and you've got this enormous 0:08:30.840 --> 0:08:34.600 database of music that's all been analyzed by professionals. We're 0:08:34.600 --> 0:08:38.040 talking about actual musicians and musicologists who have listened to 0:08:38.040 --> 0:08:41.559 the music, broken it down into its basic elements identified 0:08:41.559 --> 0:08:45.640 all of them, and someone has joined your service and 0:08:45.679 --> 0:08:48.960 they say, I'm going to make a radio station based 0:08:49.040 --> 0:08:52.640 off the song, Uh, the statue got me high by 0:08:52.640 --> 0:08:58.520 they might be giants. You would end up accessing the database, 0:08:59.520 --> 0:09:02.160 pulling the record for the statue that got me high, 0:09:02.360 --> 0:09:05.840 looking at all the genes that are associated with that, 0:09:06.240 --> 0:09:08.560 and then you would look for a certain percentage of 0:09:08.600 --> 0:09:11.640 similarity with other songs, like are there other songs that 0:09:11.720 --> 0:09:17.600 have the same genes as this song does? If so, 0:09:18.040 --> 0:09:21.000 serve it up see if the person likes it. You 0:09:21.080 --> 0:09:23.960 might set the threshold higher or lower. If it's a 0:09:24.040 --> 0:09:27.960 song that's particularly avant garde. There may not be a 0:09:28.000 --> 0:09:33.240 lot of other songs that strongly resemble your original, so 0:09:33.360 --> 0:09:36.600 you have to kind of play fast and loose with this. Now, 0:09:37.280 --> 0:09:42.560 an important component of this service is user feedback. Services 0:09:42.600 --> 0:09:46.120 like Pandora nearly always include a method for users to 0:09:46.200 --> 0:09:49.720 indicate if they like or don't like a particular song. 0:09:50.320 --> 0:09:54.480 The recommendation engine uses that data to fine tune its selections. 0:09:54.520 --> 0:09:57.679 No two songs are going to be exactly alike, so 0:09:57.720 --> 0:10:01.160 it may be that the ways the news song deviated 0:10:01.200 --> 0:10:05.000 from your seed songs format were the parts that made 0:10:05.040 --> 0:10:09.199 you detest it, So it could have been that the 0:10:09.200 --> 0:10:13.400 the the engine said, well, this song resembles the seed song, 0:10:13.480 --> 0:10:17.319 the original tune of the way. Let's serve it up 0:10:17.440 --> 0:10:19.959 and you listen to it for like three seconds, you say, no, 0:10:20.360 --> 0:10:22.719 this is this is not what I want. You give 0:10:22.760 --> 0:10:25.960 it a thumbs down. The algorithm might say, all right, well, 0:10:26.040 --> 0:10:28.720 I'm gonna keep note of where it was the same 0:10:28.760 --> 0:10:31.560 and where it was different from that original song. Meanwhile, 0:10:31.600 --> 0:10:35.920 I'll serve up this next song that has similarity. And 0:10:35.960 --> 0:10:37.760 if you say, yeah, that's a good song. I really 0:10:37.760 --> 0:10:39.640 like it, and you give it the thumbs up, then 0:10:39.679 --> 0:10:43.000 the recommendation engine starts looking at the differences between the 0:10:43.040 --> 0:10:46.280 song you said no two and the song you said 0:10:46.360 --> 0:10:50.760 yeah too, and it starts to identify stuff that you 0:10:50.880 --> 0:10:54.079 might not even be aware you don't like. It might 0:10:54.120 --> 0:10:57.280 be certain elements of songs, and the recommendation engine has 0:10:57.280 --> 0:11:01.080 figured it out. Maybe it's figured out, oh uh, Jonathan 0:11:01.200 --> 0:11:05.840 really doesn't like it when there's a clarinet in the 0:11:05.960 --> 0:11:10.319 song for no reason, but he isn't able to vocalize 0:11:10.360 --> 0:11:13.760 that he doesn't he's not aware of it consciously, but 0:11:13.880 --> 0:11:16.439 every time it's popping up he's saying no to that song, 0:11:16.520 --> 0:11:19.280 So we're gonna We're gonna put the kai bosh on 0:11:19.320 --> 0:11:21.959 the clarinet from here on out. That was just a 0:11:22.320 --> 0:11:25.360 random example. I don't I don't have a hatred of 0:11:25.400 --> 0:11:29.600 the clarinet, but it is a way for the engine 0:11:29.600 --> 0:11:32.040 to work with the user in order to get a 0:11:32.080 --> 0:11:34.840 better understanding of the type of songs that it should 0:11:34.920 --> 0:11:38.560 serve up to you. Now, there are plenty of other 0:11:38.600 --> 0:11:45.640 ways to analyze and describe music besides this genetic approach. 0:11:45.720 --> 0:11:49.880 There are entire courses dedicated to this. Musicology is a 0:11:50.000 --> 0:11:53.720 rich and interesting field, and some of these approaches go 0:11:53.800 --> 0:11:58.280 beyond the components that are directly perceptible. These analytic methods 0:11:58.320 --> 0:12:01.720 try to capture the essence of the feel of music. 0:12:02.000 --> 0:12:05.640 For example, if you take a bunch of components individually, 0:12:06.080 --> 0:12:10.640 you might quantitatively describe the music with accuracy, but you 0:12:10.720 --> 0:12:15.679 can't capture how they collectively create a particular effect. Perceptual 0:12:15.720 --> 0:12:19.680 analysis attempts to bring human perception and emotional reaction into 0:12:19.679 --> 0:12:24.240 account with everything else. But why is the Music Genome 0:12:24.280 --> 0:12:27.720 project powered by humans? Why is Pandora using actual human 0:12:27.760 --> 0:12:30.120 beings to listen to music and then write out all 0:12:30.160 --> 0:12:34.760 these genes, couldn't you find some easier way? Well? Listening 0:12:34.800 --> 0:12:37.439 to music and being able to describe its structure beyond 0:12:37.520 --> 0:12:42.720 some relatively simple angles is a particularly tricky computational problem. 0:12:42.840 --> 0:12:46.360 It's something that's easy for humans and hard for machines. 0:12:46.960 --> 0:12:50.160 In two thousand five, Way Chai of m I T 0:12:50.400 --> 0:12:54.640 wrote a paper titled Automated Analysis of Musical Structure in 0:12:54.679 --> 0:12:57.800 which she laid out the challenges of creating an automatic 0:12:57.800 --> 0:13:01.679 approach to analyzing music. Her pay Earth is nineties six 0:13:01.720 --> 0:13:03.760 pages long, and that kind of gives you an idea 0:13:03.760 --> 0:13:06.600 of how complicated a problem this is that we're talking 0:13:06.640 --> 0:13:12.400 about here. China's team relied on music cognition, machine learning, 0:13:12.480 --> 0:13:16.280 and signal processing to segment and analyze pieces of music, 0:13:16.640 --> 0:13:20.720 with the goal of isolating and analyzing the recurrent structures 0:13:20.760 --> 0:13:23.040 of a piece. You know, the whole verse, course, verse, 0:13:23.920 --> 0:13:27.559 all my fellow Pixies fans out there, the chord progression 0:13:27.840 --> 0:13:31.800 or key changes that are present in music. Identifying parts 0:13:31.920 --> 0:13:34.960 of a piece that make it representative of the whole. 0:13:35.000 --> 0:13:38.280 In other words, finding that hook or finding that element 0:13:38.280 --> 0:13:41.000 of a song that make it stand out. China's team 0:13:41.000 --> 0:13:42.720 had to figure out how to make a machine do 0:13:42.880 --> 0:13:45.960 stuff that we tend to do naturally, even without the 0:13:45.960 --> 0:13:49.600 benefit of formal musical training. So, for example, I have 0:13:49.720 --> 0:13:55.360 never taken any class beyond music appreciation, which is about 0:13:55.440 --> 0:13:58.280 as one oh one as you get, and yet I 0:13:58.320 --> 0:14:03.280 am able to voke realize certain things about music easily. 0:14:03.320 --> 0:14:07.000 I can recognize these differences, things that a computer cannot 0:14:07.080 --> 0:14:10.040 natively do without all and it requires a whole lot 0:14:10.080 --> 0:14:13.400 of work. The whole paper is available to read online. 0:14:13.640 --> 0:14:16.840 It's really interesting. I recommend checking it out. There's a 0:14:16.880 --> 0:14:19.240 PDF you can just download for free and read over it, 0:14:19.280 --> 0:14:22.720 and it's fascinating. It delves into not just the programming 0:14:22.800 --> 0:14:26.960 challenge of creating this analysis software, but also the peculiarities 0:14:27.080 --> 0:14:30.280 of music itself. For example, what makes one piece of 0:14:30.360 --> 0:14:35.280 music more memorable than another piece? What element does repetition 0:14:35.360 --> 0:14:38.160 play when it comes to making a masterpiece? Was the 0:14:38.200 --> 0:14:41.160 relationship between music, which, when you get down to it, 0:14:41.200 --> 0:14:44.960 really is just math and motion and human perception. And 0:14:45.000 --> 0:14:47.360 I could do an entire episode on Chi's work and 0:14:47.440 --> 0:14:50.400 what her team developed and how they set out to 0:14:50.440 --> 0:14:53.080 design this automated system to analyze music, but that's gonna 0:14:53.080 --> 0:14:55.440 have to wait for a later episode. For now, it's 0:14:55.480 --> 0:14:58.040 just important to understand the music is something that we're 0:14:58.080 --> 0:15:02.880 able to experience in a level that machine just cannot. Now, 0:15:03.120 --> 0:15:05.240 when we come back from the break, we're going to 0:15:05.360 --> 0:15:08.240 listen in on an interview that Noel Brown had with 0:15:08.360 --> 0:15:12.680 Alexander Lurch and learn more about musical analysis and music generation. 0:15:12.920 --> 0:15:23.080 But first let's take a quick break to thank our sponsor. Now, 0:15:23.120 --> 0:15:25.600 Like I said the top of the show, earlier this year, 0:15:25.600 --> 0:15:29.800 in Producer Extraordinary, Noel Brown took a trip to mog Fest, 0:15:29.840 --> 0:15:32.040 which was a you know, it's a conference about music 0:15:32.080 --> 0:15:34.840 and technology and science and lots of other awesome stuff, 0:15:35.120 --> 0:15:37.440 and he got to speak with a music analysis expert, 0:15:37.520 --> 0:15:42.480 Alexander Larch And what follows is their conversation. So as 0:15:42.480 --> 0:15:45.000 a bit of a layman, I interpret a lot of 0:15:45.040 --> 0:15:47.600 what you do in the field of like generative music. 0:15:47.800 --> 0:15:52.800 Is that kind of along the right lines. So um, 0:15:52.840 --> 0:15:55.600 I would say my book may kind of lead to 0:15:55.720 --> 0:15:58.760 generative music, but what I'm actually currently focusing on is 0:15:58.800 --> 0:16:02.880 more analyzing music, so figuring out what's going on in 0:16:02.920 --> 0:16:06.520 the music. So, um, it might start with you just 0:16:06.720 --> 0:16:09.120 have an audio signal and you want to know, okay, 0:16:09.160 --> 0:16:11.360 what is the temple, what is the what is the key, 0:16:11.400 --> 0:16:13.200 what is the hook line, what is the base doing? 0:16:13.600 --> 0:16:16.920 What is the mood of this piece of music? And 0:16:16.960 --> 0:16:20.720 that is when trying to apply artificial intelligence and signal 0:16:20.760 --> 0:16:24.920 processing methods to get this information to extract this inflammation 0:16:25.400 --> 0:16:28.840 from the signal. So that's something like the hit factories 0:16:28.880 --> 0:16:31.720 in Sweden would be all about, you know what they're 0:16:31.720 --> 0:16:33.480 all about, Like it seems that they take a very 0:16:33.520 --> 0:16:36.680 analytical approach to writing pop songs, where you know, they've 0:16:36.680 --> 0:16:38.520 got people that are experts in hooks, they have people 0:16:38.560 --> 0:16:40.680 that are experts in versus, and they have all these 0:16:40.800 --> 0:16:44.240 kind of human algorithms on like how long everything needs 0:16:44.280 --> 0:16:46.800 to play for in order to elicit the proper response. 0:16:47.240 --> 0:16:49.360 Is it sort of along those lines as well, yes, 0:16:49.480 --> 0:16:53.000 and so so you you want to find out, um, 0:16:53.120 --> 0:16:56.720 what kind of makes the songs successful and this might 0:16:56.800 --> 0:17:00.800 have really many many different factors impacting that. Right. So 0:17:00.840 --> 0:17:04.160 there's the structure, of course, but there's there's so many 0:17:04.160 --> 0:17:07.320 other dimensions here that it's really hard to nail it down. 0:17:07.760 --> 0:17:12.320 So using using the computer to analyze this, we try 0:17:12.359 --> 0:17:15.200 to find out more about what's going on and maybe 0:17:15.320 --> 0:17:21.280 identifying these little things that might make something popular or 0:17:21.400 --> 0:17:24.639 might give you goose bumps, or something that an example 0:17:24.680 --> 0:17:28.320 or something that maybe one wouldn't expect might accomplish something 0:17:28.359 --> 0:17:29.960 like that, or just just like an element that maybe 0:17:30.440 --> 0:17:36.480 isn't so obvious to the average listener. It's okay, let 0:17:36.480 --> 0:17:39.400 me let me think. Like it's it's hard to come 0:17:39.480 --> 0:17:42.080 up with a very good example that would be surprising 0:17:42.160 --> 0:17:47.800 to everybody. But it's definitely the combination of tiny things 0:17:47.880 --> 0:17:52.560 like maybe intonation that is somehow a little bit off, 0:17:52.800 --> 0:17:57.040 so you would say, or timing is a very obvious thing. 0:17:57.280 --> 0:17:59.800 If something grooves or not it might have the same rhythm, 0:18:00.160 --> 0:18:03.400 it might really impact you on a on a completely 0:18:03.440 --> 0:18:07.200 different level. Right, So these are examples that are maybe 0:18:07.280 --> 0:18:11.439 not surprising, but but still um point to the direction. Yeah, 0:18:11.560 --> 0:18:15.440 is it maybe an element of human human human interaction? 0:18:15.520 --> 0:18:19.000 Like I think things are too quantized, it's maybe less emotional, 0:18:19.200 --> 0:18:21.560 whereas when people enter the notes by hand and they're 0:18:21.600 --> 0:18:25.159 a little bit imperfect, or for example, the singer Adele, 0:18:25.200 --> 0:18:27.240 there was an article about how she sort of slides 0:18:27.280 --> 0:18:29.800 into her notes and that gives you goose bumps because 0:18:29.840 --> 0:18:33.080 it's got this human quality where you sense that raw 0:18:33.240 --> 0:18:36.120 human emotion in the same way. Maybe someone who does 0:18:36.200 --> 0:18:39.520 electronic music makes mistakes and leaves them in and that's 0:18:39.560 --> 0:18:44.080 what kind of makes it more approachable. Absolutely. I mean, 0:18:44.160 --> 0:18:45.679 one thing you have to keep in mind is that 0:18:45.760 --> 0:18:48.800 it's all jover and artists dependent as well, right, so 0:18:48.840 --> 0:18:51.800 there's there will definitely never be a formula. So if 0:18:51.800 --> 0:18:53.560 you want to have goose bumps, just do that and 0:18:53.600 --> 0:18:57.480 then it looks right, So you can always analyze in retrospect. Okay, 0:18:57.520 --> 0:19:01.000 this artist has this specific thing thing that he or 0:19:01.040 --> 0:19:06.680 she does and that makes things so so um fascinating 0:19:06.800 --> 0:19:12.119 or also that makes you hooked on that, But that 0:19:12.280 --> 0:19:14.840 might not work for a different genre or for a 0:19:14.880 --> 0:19:19.080 new song, right, especially because it's also about expectation and 0:19:19.119 --> 0:19:22.560 what you already know. So um, I can maybe let 0:19:22.560 --> 0:19:26.040 a computer compose something in Mozart style, right, and it 0:19:26.160 --> 0:19:29.119 might be a really good motor piece, but that doesn't 0:19:29.119 --> 0:19:34.280 mean it really gets you as a listener because you 0:19:34.320 --> 0:19:37.240 have heard so many Mozart pieces and the original will 0:19:37.400 --> 0:19:41.200 still be better. It's it's always an imitation, right, so 0:19:41.200 --> 0:19:44.399 so then it might actually miss something there, right, Even 0:19:44.440 --> 0:19:48.920 if the composition itself is very much like Mozart did it, well, 0:19:49.320 --> 0:19:52.160 so is the end product of your research to make 0:19:52.200 --> 0:19:55.360 computers better at doing this or are you just interested 0:19:55.359 --> 0:19:58.679 in kind of you know, breaking down pieces of music 0:19:58.720 --> 0:20:01.760 and to their based elements. So at the moment, I'm 0:20:01.800 --> 0:20:04.480 doing exactly that, I'm breaking it down. I I want 0:20:04.520 --> 0:20:08.200 to be able to let a computer transcribe what's going 0:20:08.240 --> 0:20:11.199 on in the music. I want to understand maybe on 0:20:11.200 --> 0:20:14.679 a perceptional level. So what makes what parameters that you 0:20:14.680 --> 0:20:18.840 can objectively extract from the audio signal? Um? What impact 0:20:19.080 --> 0:20:22.080 might they have on the listener? Right? So so how 0:20:22.119 --> 0:20:28.359 does the listener react to certain um specific characteristics of 0:20:28.560 --> 0:20:34.680 the music. But this knowledge is then also can most 0:20:34.720 --> 0:20:39.960 definitely be used to actually generate new music, um, following 0:20:40.000 --> 0:20:44.320 specific rules that you have extracted from the music and 0:20:44.359 --> 0:20:47.640 then create something new. And this is what my colleague 0:20:48.160 --> 0:20:52.720 Gil Weinberg woks a lot on with his robots that 0:20:52.760 --> 0:20:55.680 make music. Okay, tell me more about that. And let's 0:20:55.720 --> 0:20:57.960 not he the mr what it was interested? Right? Yeah? 0:20:58.000 --> 0:21:00.600 So so there's um he has a robot called him On. 0:21:01.240 --> 0:21:08.320 So she's a marimba playing robot. Um. So what Also, 0:21:08.480 --> 0:21:11.400 my my colleague is a lot into jazz, so Simon 0:21:11.520 --> 0:21:14.280 plays also a lot of jazz UM. So there's a 0:21:14.320 --> 0:21:18.760 lot of um interaction on the stage with the live musicians, 0:21:18.880 --> 0:21:22.800 and the question answer games between what what Simon plays 0:21:22.840 --> 0:21:26.520 on the marimba and what the musician then plays, and 0:21:26.800 --> 0:21:31.159 so it's it's constantly analyzed what's being what's being played, 0:21:31.200 --> 0:21:35.920 and then the robot improvises or tries to um give 0:21:36.000 --> 0:21:38.600 some answers to that jazz. I mean, you have to listen, 0:21:38.640 --> 0:21:40.439 you have to be able to follow the leads that 0:21:40.520 --> 0:21:43.600 you're you know, fellow musicians are putting out there, otherwise 0:21:43.640 --> 0:21:46.680 you're not any good exactly. This whole interaction thing is 0:21:46.680 --> 0:21:49.120 is part of the of the research obviously, and it's 0:21:49.119 --> 0:21:52.639 not only the music, right, it's only it's also just 0:21:52.640 --> 0:21:55.400 just it's eye contact and so on. So that's why 0:21:55.440 --> 0:21:58.680 this robot, even if it doesn't make any sound, has 0:21:58.720 --> 0:22:03.520 actually ahead where where she can look at specific musicians 0:22:04.080 --> 0:22:07.480 um and not her head and so on. So you see, 0:22:07.720 --> 0:22:10.400 you kind of can interact with the robot. So this, 0:22:10.400 --> 0:22:14.080 this human robot interaction is part of the research as well. Fascinating. 0:22:14.520 --> 0:22:18.000 What can you describe the difference between an algorithm that 0:22:18.520 --> 0:22:20.960 does what you're talking about and analyzes music and one 0:22:21.040 --> 0:22:23.840 that might create generative music. It seems like there's sort 0:22:23.840 --> 0:22:25.920 of a crossover between the two, and I'm just I 0:22:26.000 --> 0:22:27.840 just was probably you could kind of like spell that 0:22:27.840 --> 0:22:31.800 out a little bit for us. So, UM, in essence, 0:22:32.040 --> 0:22:36.240 the the algorithm that analyzes music is kind of the 0:22:36.280 --> 0:22:39.359 information you gain from that algorithm has to feed the 0:22:39.440 --> 0:22:43.720 generative algorithm. So, for example, you cannot compose something in 0:22:43.760 --> 0:22:46.919 classical style if you don't know classical style, right, so 0:22:47.000 --> 0:22:49.040 you have to learn it from data. That is the 0:22:49.040 --> 0:22:55.040 analysis part, and then you try to infer models from that. Right. 0:22:55.080 --> 0:22:58.840 So you you have all this data, you have you know, um, 0:22:58.880 --> 0:23:01.960 you have structural data, you have voice leading, you have 0:23:02.440 --> 0:23:05.680 maybe intonation if it's about performance, and then you try 0:23:05.720 --> 0:23:09.880 to fix this data into rules, and these rules then 0:23:10.280 --> 0:23:15.600 would generate music, for example, jazz improvisation or something that. 0:23:16.280 --> 0:23:18.960 So Brian you know, has has been kind of delving 0:23:18.960 --> 0:23:22.240 into generative music lately, and it's actually really interesting. There's 0:23:22.240 --> 0:23:24.760 a BBC documentary of him kind of showing his methods 0:23:24.760 --> 0:23:27.359 and he's just using logic and he has these little 0:23:27.440 --> 0:23:29.000 kind of nodes I guess you could call on the 0:23:29.040 --> 0:23:32.480 scripts or whatever that can set rules for like a 0:23:32.560 --> 0:23:34.800 drum part or something like that where it will say, 0:23:34.840 --> 0:23:38.080 subdivide every other whatever, like any number of things that 0:23:38.119 --> 0:23:42.359 you could input like that. Um, I guess are we 0:23:42.440 --> 0:23:44.679 at a place where that's still just kind of a 0:23:44.720 --> 0:23:47.600 gimmick or are we Are we really trying to recreate 0:23:48.680 --> 0:23:51.480 a human mind creating music or is it just kind 0:23:51.480 --> 0:23:54.520 of a different animal altogether, you know what I mean? Like, 0:23:54.600 --> 0:23:57.119 I'm wondering, are we really trying to have AI that 0:23:57.200 --> 0:24:01.320 can compose mozart, or that can place a producer or 0:24:01.320 --> 0:24:03.720 replace a songwriter, or is it just sort of like 0:24:03.800 --> 0:24:07.480