November 17, 2019
  • 2:14 am 【西瓜JUN】原创《鱼》 一定要听到副歌!!!
  • 12:18 am He was sexually abused as a child. Now this Orthodox rabbi tells his story
  • 12:18 am Ecstatic Rituals at Humber Street Gallery
  • 12:18 am Zahoďte rituály (Getting rid of rituals)
Art and AI – Cambridge ML Summit ‘19

VICTOR DIBIA: I’m really
excited about this project, and it’s something
I’ve been working on for a couple of months. And so I’m just really happy
to share it with you guys. So my name is Victor Dibia. I’m a research engineer at
Cloudera Fast Forward labs. We have an office in
Brooklyn, New York. And essentially, I spend
half of my time researching deploying technologies, and
the other half helping clients build machine
learning solutions. And then at night,
I spend my time pretending that I’m an
artist, and I make art. And so it turns out that if
you do hard neural networks, it can really help
you do anything. So the title of my talk today
is Art + AI Generating Novel African Mask Using Generative
Adversarial Networks. Cool so. So just to give you
a little background. Why combine art and AI? And so for me, it’s a
little bit personal. I grew up in a West
African country. And something that was
really interesting when I was a little boy
was that, every year, at the end of the
year in December, our extended family will
all travel to our village. And we’ll spend about
two weeks together. And so it was a fun period. And one of the highlights
of that whole experience was something called
the masquerade dances. And so what will happen is
that we’re all, in this square or like the
equivalent of a park, and you’ll see an interesting
dressed individuals like this. And they dance really fast. And they have all these
acrobatic maneuvers, and they jump around. And more importantly, they
had these really interesting elaborate masks. And so that was the
first time I actually had some encounter
with these mask. And since then I’d be really,
really fascinated by them. And so fast forward to today. 20 years later, I am a
machine learning engineer. And I really, really
want to connect to that aspect of my culture,
and my identity, and my roots. But I’m not an artist. I can’t draw. And so what can I do? I can create an artificial
neural networks and kind of leverage that as an
assistant to kind of like really explore and kind of find
new interpretations of what African masks could be. And so give a really fast primer
on what African mask art are. Here you see two pictures. On the left, you
have a bronze mask. It’s from a tribe from Nigeria. And on the right, you
have another mask example. And it’s from a
tribe in the Congo. And one of the things, if
you look at more examples, one of the things
you’ll find is that they have some interesting
characteristics. And so immediately you
find that they’re fierce. And one the reasons for
that is because they’re supposed to be
representative of deities. They’re supposed to be the
spirits of the ancestors, mythological beings, or other
beings that are believed to have power over humanity. Another characteristic
of these masks is that they’re
pretty sophisticated. And so they’re complex both
in art, design, and the range of materials that are used. And so you see, sometimes, wood. Really complex wood patterns
are used to create these masks. Sometimes it’s clay. Sometimes it bronze. Sometimes is iron. And so there’s a wide
range of materials that are used to create this mask. The third interesting
thing about them is that they’re
surprisingly ancient. And so archaeological
discoveries show that some of these
pieces date as far back as the 9th century. And then, finally,
they are functional. And so they’re not
just artistic pieces. They’re created to be consumed
for their visual excellence. They’re actually functional. So much like you
would wear a dress, and you would like it to
reflect your identity, that’s pretty much
in a similar way how people how would don masks. And it will be a really
important expression of the identity. And so what did I do? And how did all of
this come together to inspire this project? And so first of all, it’s a
way for me to kind of explore, engage with my
culture and identity. And it’s a way for me to
bridge AI, art, and technology. So that’s one of the
motivations for this project. And another thing we
probably have noticed is that there’s an uptick in
this area of generative art and art inspired by AI. But one limitation in this
area is that most of it is characterized by
classic European art. And so I thought this might
be an opportunity to diversify the conversation in this space. And then finally,
if any of you work in the domain of
generative modeling, you will find that some
of the data sets you find, there are things like, I don’t
know, fashion [INAUDIBLE],, and [INAUDIBLE] 100. And so there’s opportunity to
contribute more complex data sets that really help us
push the limits of what generative modeling can do. As a part of this– the goals of this project
is to, at some point, contribute a data
set that enables more complex generative modeling. And so these are the main
motivations for this project. So before I go ahead, I’d like
to show you guys some results. And so the entire panel you
see here, none of these masks are actually real. And so everything
you see here are all generated images
and representations of what the neural
network that I trained thinks an African mask could
look like, and could be. And I have a lot
of favorites here. And I’ll show you
a couple of them as the presentation goes on. OK, so just to give some
background information on the process that I
work through as part of this project. And so the first thing
is that this project falls within the broader
domain of efforts that have been termed as
computational creativity, or generative art,
or even code art. And so it is a pretty old
field, since an old era, and some of the
incredible contributions here have been in terms of tools
that are used in this domain. So how many of us
are these P5JS? Yes? Processing? Cool. So awesome. And so, for most of these
tools, the intuition here is that you have
an idea of some art you want to create in your head. And you write some code. And then you bring it to life. And so some of the patterns
you see in the background, these are all generated by
procedural code that goes ahead based on your instructions,
based on your code, you generate some bit of art. But neural network kind
of changes this paradigm just a little bit. And so neural networks
are a class of software that allow machines to learn
rules directly from data. And so as opposed to tools
like P5JS, or Processing, imagine if you wanted to
draw some circles in P5JS. You would write a
few lines of code. And, so if you want to do the
same with neural networks, rather than writing code, you
actually go and gather images. And so, in this
case, your image data becomes the code actually– or the image data becomes a
source input to your network. And then your network
generates the code? So why would we want to do that? And so if we wanted to generate
simple stuff like the patterns we see at the back, it’s easy to
actually write code to do this. But if you actually
want to write code that generates
African masks– and not just that,
you want stuff that’s kind of novel and
new, how would you go about writing code like that? You probably can’t. And so for this, you
actually need neural networks that help you kind of
imagine this process. And so, when I started
working on this project, I kind of split it into
three simple parts, which is the basic
data science process. And so the first interesting
part is curating the data set. The second part has to do
with training the model and making decisions
around what kind of GAN architecture do you use, how do
you set your hyperparameters. What kind of resolution
do you want to generate. And then finally,
there’s some work put into evaluating the results. And then interpreting
what the neural network has come up with. And so for often data
set curation, I guess, as one of my previous
speakers mentioned– this is the most interesting,
most amazing, most delicious part of the process, right? And so I started with data
scraping code, manual image downloads, and so at
the end of this process, I had about 20,000 images. And then I went
through the process of careful hand curation. So this involves removing
vector images because we want realistic images. This involves removing
incorrect content and masks that had nothing to do with Africa. Then finally removing
a bunch of duplicates. And at the end, I was left with
about 11,000 curated images. And so this process
takes a bit of effort. It took a couple of weeks
to actually carefully curate and kind of like get the
data set in the condition, in the manner that I
was comfortable going ahead and using it to
train my neural network. And so once I had all
my neural network data, once I had all my
data available, the next interesting thing was
to identify the right model to use. And for this I use
something called the about generative network,
which are also called GANs. And so GANs are a really
interesting arrangement of two neural networks. So on one hand, we have
something called the generator. And on the other hand,
we have something called a discriminator. And so on the right,
you have an image that describes what the
generators looks like. And so what it does is that
it’s a feed forward network. It takes in a bit
of fixed noise. And the output of that process
is that it generates an image. And so the goal is that,
as the model is trained, you want it to take
in this noise vector and generate an
image that actually looks like an actual mask. And so the second
part of this model is something called
the discriminator. And it has a simple task. And so here, its goal
is to take in an image and tell if this image is
actually real or it’s fake. Fake in this sense
being something that was generated
by the generator, something that really didn’t
come from the training data set. And believe it or
not, it turns out that if you structure these
two networks together, and you get them to play a game,
at the end of that process, you actually have a
generator network that has learned to come up with
images that look plausible. They look like the actual masks. And you have a
discriminator network that has learned to
become really, really good at telling fake from real. And to describe
this whole process, I like to tell this
really funny story of the police and the forger. And so the story is like this. First, the forger
starts as an amateur. And so his task is to
create art that, you know, looks like a, let’s say,
van Gogh or a Rembrandt. And he wants to sell
it and make profit. But because he’s really
not good at this process, once he tries to sell the art,
the police comes and says, hey, that’s a fake, you know? You’re going to jail. And so this guy is caught. He goes to jail. There, he meets the Godfather. And the Godfather tells
him, oh, you know, this is the reason
why you were caught. And then he gets better,
and he gets out of jail. And he makes even better fakes. And at the end of this
process, the forger gets very good at creating fakes
that the police can’t detect. At the end of this process,
if the police does his work correctly, he becomes even
better at distinguishing fake from real. So this is pretty much the
process that a GAN actually goes through. And at the end of the
process, we want to– actually we’re really
interested in generator such that, when we’re done
with the training process, we can actually give
it some random noise. It gives us a new mask
image that looks interesting and looks like it’s a part of
the training data distribution. OK. And so I have a little
video on the left. Looks like it’s not
loading up right now. But what it was
going to show you was an example of what the
training cycle looks like. So initially what you would
see is that each of the boxes starts out as random noise. So the generator has very
little skill at that process. But as the training, as we
get with various epochs, you will find that
it actually starts to come up with stuff that
really looks like masks. And so there are a few processes
I went through in order to actually train this model. And so I started out with
a sample, DCGAN TensorFlow implementation. And so DCGAN stands for
Deep Convolutional Generated Adversarial Networks. So if you’re working
with images– in this case, I’m
generating images of masks– it’s really important
that you use, your generator and
your discriminator actually have
convolutional variance. And so a DCGAN is just
one formulation of a GAN that allows you to
generate images. Along this process, one of
the challenges that I faced was that I had to write my own
custom data input pipeline. And one of the
issues that came up was there are some
differences in how I had generated my
data set compared to how the code was
actually expecting the image data to look like. And so when each of us here
process images, how do we specify that length,
height, width and channels, or channels, height, and width? Which one do we use? Channels at the end,
or channels in front? Any preferences? OK. So typically, I usually,
for a given image, I specify the height, the
width, and then the number of channels. However, at some part
in the sample code, I found it was actually channels
first, height and width. So what I just meant was that
the way that the model was consuming the data was wrong. And all of the first 100
experiments I ran just didn’t work well. So one thing you
really should do is to validate your data format. And so that will really
save you a lot of pain. And so the other thing I did was
that the sample code I found, it was designed to generate
32 pixel 32 by pixel images. So I just had to modify the
generator and the discriminator such that it learned to
generate higher resolutions. That’s 64 px and 128 px. And then finally,
you wrap all of this into an experiment, a couple
of files, Python files. In this case, I
trained with a TPU. And I’ll get into
a bit of rationale for why TPUs work well
for a problem like this. So you set up your TPU instance,
and then you get your data into a GCP bucket. And then finally you
run your training. And so it turns out
that setting up a TPU is a fairly
straightforward process. There are two interesting
things that you need to do. The first thing is, you
set up your project, set up your zone data,
you need to enable TPUs for your GCP project. Once you do all of that, you
use the CTPU command line tool. Essentially, you run
CTPU, and essentially what it does is that it
sets up a compute engine virtual machine and
allocates some TPU resources where you can actually
run your training. And so, once you’re done
with that, you could– in this case, this is
the code I just used to spin up my training process. So here you set an
environment variable that mentions your GCP bucket name. And then, the name of your TPU. And DCGAN is the
name of the file that actually runs my training. So it’s a fairly
straightforward process. And it’s kind of
easy to implement. And so why TPU? So the first value here is
that it’s easy to set up. It’s fast. And it turns out that
GPU is available. It turns out that
GPUs are available. And so if you’re
here in the room, and you would like some
guidance in how you can go ahead running training
jobs on TPUs, I’m happy to discuss with you
a little bit more on that. And it turns out that
they’re really fast. And so you could use
them as individual TPUs, or you could use them in
the pod formation, where you can use AL16 TPUs. The main limitation
here is that you need to rewrite your code
such that it takes advantage of the TPU architecture. So this process, it’s not always
obvious to a lot of developers because your basic
TensorFlow code, or your Keras might not
run just as is on the TPU. You actually need to
write some code that gets your applications
to run on TPUs. And so, right now,
the way you do that is to use TPU
estimators, which are a little harder to work with. They’re less friendly
than Keras TF2.0. So I know that there’s some
work in progress to allow GPU support for TF2.0 and Keras. But at the last
time I checked, I don’t think that’s
available yet. And so, at this point,
I should be really happy I have some results. I’m able to generate
some images. But one thing that
I actually did notice was that,
because of instabilities in training a GAN, some
of the best results I got were with just the 64
pixel configuration. So for 128 pixels,
there’s a known problem with training GANs
called mode collapse, where your GAN just
doesn’t generate a variety, or a lot of different images. And so what can we do to
actually address this issue? So it turns out that
we can use something called super resolution GANs. And so these are another
formulation of GANs where you have your generator. Rather than taking in
a randomized vector and then generating an
image, what it takes in is that it takes in
a lower resolution the version of an
image, and it learns to generate a high resolution
version of an image. So the question
is, does this work? And so I’m going to show you
some of my favorite examples. And so, on the
left here, we have 64 px image, something that
was generated by the model. And then we pass this through
a super resolution GAN. In this case, it is a GAN
trained within a product called Topaz labs gigapixel. And on the right, we
have a 378 pixel output. And so you will notice here
that there are actually some details that just don’t
exist in the 64 px image. But now they are
actually available in the 378 pixel images. And it is pretty
much like magic. And so then the
neural network here has kind of performed another
stage of interpretation where it has hallucinated
some details that look like they fit into the
resulting super resolution image. And so this is another example. Super low resolution on the
left, high resolution ESRGAN results on the right. So this is one more
example, one more example. These are some of my
favorite examples curated. I guess the only thing I
haven’t done at this stage is to name them. What would be a good
name for this guy? Sad face. Yeah, I’ll take some
time and name them. And so these are some of
the interesting things. And so, at this point, I do now
have high resolution images. And so the researcher
in my head is, you know, how novel are the
images that are actually generated by this mask? Did it really come up
with stuff that’s new? Or did it just copy some stuff
from the training data set. Did it mishmash stuff together? I really want to know. And I want to find ways
to actually explore what the GAN has actually learned. And so, to do this,
one thing I did was that I built a tool that
helps with algorithmic art inspection. And so basically, what I
do is that I use concepts from semantic image search. And for each of the
generated images I use a pretrained model
to extract some features. And then, using the
same pretrained model, I extract features from all
the images in the data set. And that way, for
each selected image, I can then go ahead
and say, for the image that I have selected– so for this image that I’ve
selected, which is generated, these are all the other
images in the data set that look the closest to it. And so, the good news is
that I didn’t see cases where the neural network
had it done verbatim copy. And it actually generated
substantially different, just not as smooth and
as clean as what we have in the original data set. And so this sort of
algorithmic inspection also enabled other things. So it enabled me to find
specific types of modes that the GAN had learned. And so, in this case,
it’s oblong mask images. And so the GAN had
learned to generate masks that had this oblong shape. In some other case,
it had learned to generate masks
that had, like, hair or hair-like projections. And in some other
places, it learned to generate of oval masks. And so this is towards the
end of my presentation. And so what are some
reflections on things that I thought were interesting,
and open questions that arose as I worked on this project? So the first thing that I
think is, as algorithmic arts, generative art
becomes more popular, I think we need more tools that
allow us to actually inspect novelty and visual quality
of what’s being generated. So yes, we have AI as a partner
that’s helping us create art. But it’s also
important that we have metrics and tools
that help us evaluate, is this stuff really novel? If Picasso was alive
today, would he be disappointed by what
we’re calling art and not? I guess that’s a
discussion for another day. And so the second thing
that arises frequently is, who has agency? And so the stuff
that’s generated, is it mine, or is
someone going to sue me some years down the line
and say, oh, you know, this stuff belongs
to the machine, the generative
adversarial network. And so there are a lot of
conversations around that. My thinking is that the process
of curating the data set is some form of art. And so that only
comes from the artist, which turns out to be me. The process of
identifying the right hyperparameters to give these
really nice, cool images, that also comes from the artist,
which turns out to be me. And then finally,
who names the art? I name it, and I also
kind of interpret it and explain what the meaning
is to the rest of the world. So I think, in this
way, the artist retains some kind of
artistic contribution. And the machine, well,
is just an assistant. And so there’s also
ethical issues that arise. It’s important that, you
know, artifacts generated from this project are
used in a way that’s inclusive of its origins,
the sources where it came from in West Africa. It’s important to
think of best practices around the distribution
of these data sets and every other
emerging concerns. And so, for some next steps,
conditional generation, can we say, oh, you know, give
me a mask that came from Congo? Or give me a mask that’s
from North Africa? So there are ways to inject
this kind of information to the training
process so that we can perform this kind
of condition generation. How can we improve the
quality of generated images? There’s some papers,
a speciality paper on relativistic GANs. So that’s one potential
approach there. And then finally, I’m working
on an interactive tool for navigating the latent
space of all the images that have been learned by this GAN. So that could be a potential
interactive installation. And so, if you’re really
interested in this sort of work, I actually
have some sample code. If you go on GitHub, you
actually can get your images. The code will actually help
you generate TF records with the images. And it will help you generate
a 64 px or 128 px GAN that generates these kind of images. And so I’ve written
some blog posts. And also this algorthmic
art inspection interface. And you can find
all of that there. Thank you. [APPLAUSE] [MUSIC PLAYING]

Otis Rodgers