Classification

You will find here the application of DA methods from the ADAPT package on a simple two dimensional DA classification problem.

First we import packages needed in the following. We will usematplotlibAnimation tools in order to get a visual understanding of the mselected methods:

[1]:
importnumpyasnpimportmatplotlib.pyplotaspltimportmatplotlibimportmatplotlib.animationasanimationfromsklearn.metricsimportaccuracy_scorefrommatplotlibimportrcrc('animation',html='jshtml')

Experimental Setup

We now set the synthetic classification DA problem using themake_classification_da function fromadapt.utils.

[2]:
fromadapt.utilsimportmake_classification_daXs,ys,Xt,yt=make_classification_da()x_grid,y_grid=np.meshgrid(np.linspace(-0.1,1.1,100),np.linspace(-0.1,1.1,100))X_grid=np.stack([x_grid.ravel(),y_grid.ravel()],-1)

We define hereshow function which we will use in the following to visualize the algorithms performances on the toy problem.

[3]:
defshow(ax,yp_grid=None,yp_t=None,x_grid=x_grid,y_grid=y_grid,Xs=Xs,Xt=Xt,weights_src=50*np.ones(100),disc_grid=None):cm=matplotlib.colors.ListedColormap(['w','r','w'])# ax = plt.gca()ifyp_gridisnotNone:ax.contourf(x_grid,y_grid,yp_grid,cmap=cm,alpha=1.)ax.plot([Xs[0,0]],[Xs[0,1]],c="red",label="class separation")ifdisc_gridisnotNone:cm_disc=matplotlib.colors.ListedColormap([(1,1,1,0),'g',(1,1,1,0)])ax.contourf(x_grid,y_grid,disc_grid,cmap=cm_disc,alpha=0.5)ax.plot([Xs[0,0]],[Xs[0,1]],c="green",label="disc separation")ifyp_tisnotNone:score=accuracy_score(yt.ravel(),yp_t.ravel())score=" - Acc=%.2f"%scoreelse:score=""ax.scatter(Xs[ys==0,0],Xs[ys==0,1],label="source",edgecolors='k',c="C0",s=weights_src[ys==0],marker="o",alpha=0.9)ax.scatter(Xs[ys==1,0],Xs[ys==1,1],edgecolors='k',c="C0",s=2*weights_src[ys==1],marker="*",alpha=0.9)ax.scatter(Xt[yt==0,0],Xt[yt==0,1],label="target"+score,edgecolors='k',c="C1",s=50,marker="o",alpha=0.9)ax.scatter(Xt[yt==1,0],Xt[yt==1,1],edgecolors='k',c="C1",s=100,marker="*",alpha=0.9)ax.legend(fontsize=14,loc="lower left")ax.set_xlabel("X0",fontsize=16)ax.set_ylabel("X1",fontsize=16)
[4]:
fig,ax=plt.subplots(1,1,figsize=(8,6))show(ax)plt.show()
../_images/examples_Classification_9_0.png

As we can see in the figure above (plotting the two dimensions of the input data), source and target data define two distinct domains. We have modeled here a classical unsupervised DA issue where the goal is to build a good model on orange data knowing only the labels (“o” or “*” given byy) of the blue points.

We now define the base model used to learn the task. We use here a neural network with two hidden layer. We also define aSavePrediction callback in order to save the prediction of the neural network at each epoch.

[5]:
importtensorflowastffromtensorflow.kerasimportSequentialfromtensorflow.keras.layersimportInput,Dense,Reshapefromtensorflow.keras.optimizersimportAdamdefget_model(input_shape=(2,)):model=Sequential()model.add(Dense(100,activation='elu',input_shape=input_shape))model.add(Dense(100,activation='relu'))model.add(Dense(1,activation="sigmoid"))model.compile(optimizer=Adam(0.01),loss='binary_crossentropy')returnmodel
[6]:
fromtensorflow.keras.callbacksimportCallbackclassSavePrediction(Callback):"""    Callbacks which stores predicted    labels in history at each epoch.    """def__init__(self,X_grid_=X_grid,Xt_=Xt):self.X_grid=X_grid_self.Xt=Xt_self.custom_history_grid_=[]self.custom_history_=[]super().__init__()defon_epoch_end(self,batch,logs={}):"""Applied at the end of each epoch"""predictions=self.model.predict_on_batch(self.X_grid).reshape(100,100)self.custom_history_grid_.append(predictions)predictions=self.model.predict_on_batch(self.Xt).ravel()self.custom_history_.append(predictions)

Src Only

First, let’s fit a network on source data without any adaptation. As we can observe, the “o” labels from the target domain are missclassified. Because of the “” blue points close to the “o” domain, the network learns a class border not regularized enough and then misclassifies the target “” data.

[98]:
np.random.seed(0)tf.random.set_seed(0)model=get_model()save_preds=SavePrediction()model.fit(Xs,ys,callbacks=[save_preds],epochs=100,batch_size=100,verbose=0);
[99]:
defanimate(i):ax.clear()yp_grid=(save_preds.custom_history_grid_[i]>0.5).astype(int)yp_t=save_preds.custom_history_[i]>0.5show(ax,yp_grid,yp_t)
[104]:
fig,ax=plt.subplots(1,1,figsize=(8,6));ani=animation.FuncAnimation(fig,animate,frames=100,interval=60,blit=False,repeat=True)
[106]:
ani

src_only

DANN

We now consider theDANN method. This method consists in learning a new feature representation on which nodiscriminator network can be able to classify between source and target data.

This is done with adversarial techniques following the principle of GANs.

[107]:
defget_encoder(input_shape=(2,)):model=Sequential()model.add(Dense(100,activation='elu',input_shape=input_shape))model.add(Dense(2,activation="sigmoid"))model.compile(optimizer=Adam(0.01),loss='mse')returnmodeldefget_task(input_shape=(2,)):model=Sequential()model.add(Dense(10,activation='elu'))model.add(Dense(1,activation="sigmoid"))model.compile(optimizer=Adam(0.01),loss='mse')returnmodeldefget_discriminator(input_shape=(2,)):model=Sequential()model.add(Dense(10,activation='elu'))model.add(Dense(1,activation="sigmoid"))model.compile(optimizer=Adam(0.01),loss='mse')returnmodel
[21]:
fromtensorflow.keras.callbacksimportCallbackclassSavePredictionDann(Callback):"""    Callbacks which stores predicted    labels in history at each epoch.    """def__init__(self,X_grid_=X_grid,Xt_=Xt,Xs_=Xs):self.X_grid=X_grid_self.Xt=Xt_self.Xs=Xs_self.custom_history_grid_=[]self.custom_history_=[]self.custom_history_enc_s=[]self.custom_history_enc_t=[]self.custom_history_enc_grid=[]self.custom_history_disc=[]super().__init__()defon_epoch_end(self,batch,logs={}):"""Applied at the end of each epoch"""predictions=model.task_.predict_on_batch(model.encoder_.predict_on_batch(self.X_grid)).reshape(100,100)self.custom_history_grid_.append(predictions)predictions=model.task_.predict_on_batch(model.encoder_.predict_on_batch(self.Xt)).ravel()self.custom_history_.append(predictions)predictions=model.encoder_.predict_on_batch(self.Xs)self.custom_history_enc_s.append(predictions)predictions=model.encoder_.predict_on_batch(self.Xt)self.custom_history_enc_t.append(predictions)predictions=model.encoder_.predict_on_batch(self.X_grid)self.custom_history_enc_grid.append(predictions)predictions=model.discriminator_.predict_on_batch(model.encoder_.predict_on_batch(self.X_grid)).reshape(100,100)self.custom_history_disc.append(predictions)
[22]:
fromadapt.feature_basedimportDANNsave_preds=SavePredictionDann()model=DANN(get_encoder(),get_task(),get_discriminator(),lambda_=1.0,optimizer=Adam(0.001),random_state=0)model.fit(Xs,ys,Xt,callbacks=[save_preds],epochs=500,batch_size=100,verbose=0);
[23]:
enc_s=np.concatenate(save_preds.custom_history_enc_s)enc_t=np.concatenate(save_preds.custom_history_enc_t)enc=np.concatenate((enc_s,enc_t))x_min,y_min=enc.min(0)x_max,y_max=enc.max(0)x_min,y_min=(0.,0.)x_max,y_max=(1.,1.)defanimate_dann(i):i*=3yp_grid=(save_preds.custom_history_grid_[i]>0.5).astype(int)yp_t=save_preds.custom_history_[i]>0.5ax1.clear()ax2.clear()ax1.set_title("Input Space",fontsize=16)show(ax1,yp_grid,yp_t)ax2.set_title("Encoded Space",fontsize=16)Xs_enc=save_preds.custom_history_enc_s[i]Xt_enc=save_preds.custom_history_enc_t[i]X_grid_enc=save_preds.custom_history_enc_grid[i]x_grid_enc=X_grid_enc[:,0].reshape(100,100)y_grid_enc=X_grid_enc[:,1].reshape(100,100)disc_grid=(save_preds.custom_history_disc[i]>0.5).astype(int)show(ax2,yp_grid,yp_t,x_grid=x_grid_enc,y_grid=y_grid_enc,Xs=Xs_enc,Xt=Xt_enc,disc_grid=disc_grid)ax2.set_xlabel("U0",fontsize=16)ax2.set_ylabel("U1",fontsize=16)ax2.set_xlim(x_min,x_max)ax2.set_ylim(y_min,y_max)
[108]:
fig,(ax1,ax2)=plt.subplots(1,2,figsize=(16,6))ani=animation.FuncAnimation(fig,animate_dann,interval=60,frames=166,blit=False,repeat=True)
[109]:
ani

dann

[ ]:
ani.save('dann.gif',writer="imagemagick")

As we can see on the figure above, when applyingDANN algorithm, source data are projected on target data in the encoded space. Thus atask network trained in parallel to theencoder and thediscriminator is able to well classify “o” from “*” in the target domain.

Instance Based

Finally, we consider here the instance-based methodKMM. This method consists in reweighting source instances in order to minimize the MMD distance between source and target domain. Then the algorithm trains a classifier using the reweighted source data.

[20]:
fromadapt.instance_basedimportKMMsave_preds=SavePrediction()model=KMM(get_model(),gamma=1,random_state=0,loss="mae")model.fit(Xs,ys,Xt,callbacks=[save_preds],epochs=100,batch_size=100,verbose=0);
Fit weights...     pcost       dcost       gap    pres   dres 0:  4.1412e+04 -1.3491e+06  3e+07  4e-01  2e-15 1:  1.8736e+02 -2.9533e+05  4e+05  2e-03  5e-13 2:  2.0702e+02 -3.6581e+04  4e+04  2e-05  7e-14 3:  8.2217e+01 -1.6809e+04  2e+04  7e-06  4e-14 4: -3.5699e+03 -2.6162e+04  2e+04  7e-06  3e-14 5: -3.6501e+03 -7.6959e+03  4e+03  1e-06  5e-15 6: -3.8524e+03 -8.5199e+03  5e+03  4e-16  2e-16 7: -4.0411e+03 -4.6607e+03  6e+02  2e-16  2e-16 8: -4.0654e+03 -4.4933e+03  4e+02  2e-16  1e-16 9: -4.0776e+03 -4.1640e+03  9e+01  2e-16  2e-1610: -4.0853e+03 -4.1556e+03  7e+01  2e-16  2e-1611: -4.0894e+03 -4.0973e+03  8e+00  2e-16  1e-1612: -4.0903e+03 -4.0934e+03  3e+00  1e-16  2e-1613: -4.0906e+03 -4.0912e+03  6e-01  1e-16  1e-1614: -4.0906e+03 -4.0911e+03  4e-01  2e-16  1e-1615: -4.0907e+03 -4.0908e+03  1e-01  2e-16  1e-1616: -4.0907e+03 -4.0908e+03  5e-02  2e-16  2e-1617: -4.0908e+03 -4.0908e+03  2e-02  2e-16  1e-1618: -4.0908e+03 -4.0908e+03  3e-03  1e-16  2e-16Optimal solution found.Fit Estimator...
[21]:
defanimate_kmm(i):ax.clear()yp_grid=(save_preds.custom_history_grid_[i]>0.5).astype(int)yp_t=save_preds.custom_history_[i]>0.5weights_src=model.predict_weights().ravel()*50show(ax,yp_grid,yp_t,weights_src=weights_src)
[110]:
fig,ax=plt.subplots(1,1,figsize=(8,6))ani=animation.FuncAnimation(fig,animate_kmm,interval=60,frames=100,blit=False,repeat=True)
[111]:
ani

kmm