gyoza.tutorials package

Submodules

gyoza.tutorials.data_synthesis module

gyoza.tutorials.data_synthesis.add_noise(x_1, x_2, noise_standard_deviation)

gyoza.tutorials.data_synthesis.create_data_set(S: array, manifold_function: Callable, noise_standard_deviation: Tuple[float, float]) → Tuple[array, array][source]

Creates a data set by passing the position S along the manifold through the manifold_function and adding Gaussian noise to each dimension. That noise is centered at zero and has standard deviation noise_standard_deviation.

Parameters:

S (np.array) – The position along the manifold. Shape == [$M$], where $M$ is the number of instances
manifold_function (Callable) – A function that maps from S to 2-dimensional coordinates on the manifold in the real two dimensional plane.
noise_standard_deviation (Tuple[float, float]) – Tuple with standard deviations for random normal noise added to the two respective z-dimensions.

Returns:

Z (numpy.ndarray) - The collection of points around the manifold. Shape == [$M$, $N$], where $M$ is the
instance count and $N$ the dimensionality.
Y (numpy.ndarray) - The target for factorization. Here, it is simply a matrix of shape == [$M$, $F$], where $M$ is the instance count and $F=2$ the factor count. The first column consists of standard normally distributed random numbers that correspond to the residual factor. The second column is the standardized S for the position factor.

gyoza.tutorials.data_synthesis.factorized_pair_iterator(X: ndarray, Y: ndarray, batch_size: int, target_correlations: List[float]) → Generator[Tuple[Tensor, Tensor], None, None][source]

This infinite iterator yields pairs of instances X_a and X_b along with their corresponding factorized correlation Y_ab. Pairs are obtained by first drawing an x_a instance uniformly at random from X. Then, given this x_a instance, the target_correlations and a standard normally distributed helper random random variable, a hypothetical x_b instance is computed. Then, the closest (in the L-2 norm sense) X instance to this hypothetical x_b instance is chosen as the actual x_b instance. The two instances x_a and x_b form a pair. If the batch_size is too large for the given total number of instances in X, it is possible that pairs occur more than once in a batch.

Parameters:

X (numpy.ndarray) – Input data of shape [instance count, …], where … is any shape convenient for the caller.
Y (numpy.ndarray) – Scores of factors of shape [instance count, factor count], including the residual factor at index 0. These y-values are assumed to be normally distributed and uncorrelated.
batch_size (int) – Desired number of instances per batch

Target_correlations:

The desired correlations that x_a and x_b shall have along the different factors. Also the residual factor with its correlation of zero shall be included. The order of entries in target_correlations should be aligned with the order of factors in Y.

Yield:

X_ab (numpy.ndarray) - A batch of instance pairs of shape [batch_size`, 2, …], where 2 is due to the
concatenation of X_a and X_b and … is the same instance-wise shape as for X.
Y_ab (numpy.ndarray) - The target_similarities prepended with a zero and casted to shape [batch_size, factor count], including the residual factor.

gyoza.tutorials.data_synthesis.reset_random_number_generators(seed: int)[source]

This function resets the random number generators of python, numpy and tensorflow.

Parameters:: seed (int) – The new seed that shall be provided to each random number generator.

gyoza.tutorials.mathematics module

gyoza.tutorials.mathematics.archimedian_spiral(xs, alpha)[source]

gyoza.tutorials.mathematics.cartesian_to_polar(x, y)[source]

gyoza.tutorials.mathematics.logarithmic_spiral(xs, alpha, beta)[source]

gyoza.tutorials.mathematics.normal(f, f_prime, x_0)[source]

gyoza.tutorials.mathematics.polar_to_cartesian(rho, phi)[source]

gyoza.tutorials.mathematics.rotate(xs, ys, theta)[source]

gyoza.tutorials.mathematics.tangent(f, f_prime, x_0)[source]

gyoza.tutorials.modelling module

gyoza.tutorials.modelling.cross_validate(Z: ndarray, Y: ndarray, target_correlations: List[float], networks: List[DisentanglingFlowModel], batch_size: int, epoch_count: int, manifold_name: str, plot_losses: bool = False) → Tuple[List[ndarray], List[ndarray]][source]

Performs cross-validation on the provided networks. It first shuffles the data Z and Y, then partitions it into fold-count many equally sized subsets and then calibrates each network on the data set-minus one subset of the partition. The held out subsets are returned for model evaluationl. The fold-count is inferred from the length of network. This implementation assumes that the networks are SupervisedFactorNetworks that are calibrated using the volatile_factorized_pair_iterator().

Parameters:

Z (numpy.ndarray) – The input to the networks (when in inference mode) of shape [$M$,:math:N], where $M$ is the total instance count and $N$ the dimensionality.
Y (numpy.ndarray) – The factorized targets per instance of shape [$M$. $F$], where $M$ is the total instance count and $F$ is the factor count.
networks (List[SupervisedFactorNetwork]) – The networks to be calibrated. They are assumed to be compiled with an optimizer.
batch_size (int) – The size of batches used during calibration.
epoch_count (int) – The number of times the iterator shall cycle through the data during calibration of each network.
minimum_similarity (float, optional, defaults to 0.0) – A float typically non-negative, that indicates the minimum similarity that instance in a pair need to have in order to be output by the iterator. The lower the value, the more variety there will be among pairs.
manifold_name (str) – The name of the manifold on which the networks are fitted. Used for the title of the plot.
minimum_similarity – The minimum similarity that instances should have to be listed as a pair by the volatile_factorized_pair_iterator.
plot_losses (bool, default False) – Indicates whether the loss per epoch shall be plotted for each network.

Target_correlations:

The desired correlations that x_a and x_b shall have along the different factors. Here, only the actual factors should be included in the list, while the residual factor will be assigned a correlation of zero. The order of entries in target_correlations should be aligned with the order of factors in Y.

Returns:

Z_test (List[numpy.ndarray]) - The test subsets for the model input, one for each instance in networks, having
shape [$M^*$,:math:N], where $M^*$ is the test set size equal to the total instance count in Z divided by the number of instances in networks and $N$ is the dimensionality.
Y_test (List[numpy.ndarray]) - The test subsets for the models factor targets, one for each instance in networks, having
shape [$M^*$,:math:F], where $M^*$ is the test set size equal to the total instance count in Z divided by the number of instances in networks and $F$ is the number of factors.

gyoza.tutorials.plotting module

gyoza.tutorials.plotting.color_palette = array([[255., 0., 0.], [255., 17., 0.], [255., 34., 0.], [255., 51., 0.], [255., 68., 0.], [255., 85., 0.], [255., 102., 0.], [255., 119., 0.], [255., 136., 0.], [255., 153., 0.], [255., 170., 0.], [255., 187., 0.], [255., 204., 0.], [255., 221., 0.], [255., 238., 0.], [255., 255., 0.], [213., 255., 0.], [170., 255., 0.], [128., 255., 0.], [ 85., 255., 0.], [ 43., 255., 0.], [ 0., 255., 0.], [ 0., 255., 63.], [ 0., 255., 127.], [ 0., 255., 191.], [ 0., 255., 255.], [ 0., 232., 255.], [ 0., 209., 255.], [ 0., 186., 255.], [ 0., 163., 255.], [ 0., 140., 255.], [ 0., 116., 255.], [ 0., 93., 255.], [ 0., 70., 255.], [ 0., 47., 255.], [ 0., 24., 255.], [ 0., 0., 255.], [ 19., 0., 255.], [ 39., 0., 255.], [ 58., 0., 255.], [ 78., 0., 255.], [ 98., 0., 255.], [117., 0., 255.], [137., 0., 255.], [156., 0., 255.], [176., 0., 255.], [196., 0., 255.], [215., 0., 255.], [235., 0., 255.], [255., 0., 255.], [255., 0., 213.], [255., 0., 170.], [255., 0., 128.], [255., 0., 85.], [255., 0., 43.]]): A convenience variable, storing the color palette that is computed by __make_color_palette__().

gyoza.tutorials.plotting.evaluate_and_plot_networks(Z_test: List[ndarray], Y_test: List[ndarray], networks: List[FlowModel], manifold_name: str)[source]

For each network, a scatter plot for the predicted and actual position along the manifold is plotted along with a bar for the proportion of explained variance.

Parameters:

Z_test (List[np.ndarray]) – A list of test sets used as input to the corresponding network in networks. The list is expected to have the same length as networks and each test set is assumed to have shape [$M^*$,:math:N], where $M^*$ is the number of instances in a test set and $N=2$ is the dimensinoality of an instance.
Y_test (List[np.ndarray]) – A list of test sets used to evaluate to the corresponding network in netwroks. The list is expected to have the same length as networks and each test set is assumed to have shape [$M^*$,:math:F], where $M^*$ is the number of instances in a test set and $F=2$ is the number of factors. It is assumed that factor at index 1 encodes the position along the manifold.
networks (List[mfl.SupervisedFactorNetwork]) – A list of calibrated networks that take Z_test as input and whose output (of same shape as input) encodes position along the data manifold along index 2.
manifold_name (str) – The name of the manifold used in the figure title.

gyoza.tutorials.plotting.make_2_dimensional_gaussian(mu: ndarray, sigma: ndarray, shape: List[int]) → ndarray[source]

Generates a 2 dimensional Gaussian distribution.

Parameters:

mu (np.ndarray) – The two means for the Gaussian variables. Assumed to be of shape [2].
sigma (np.ndarray) – The covariance matrix. Assumed to be of shape [2,2].
shape (_type_, optional) – The desired shape of the output.

Returns:

X (numpy.ndarray) - Coordiates of the grid of the two variables. Shape == [ shape [0]* shape [1],2].
p (numpy.ndarray) - The probabilities associated with the coordinates X. Shape == [ shape [0]* shape [1]].
D (numpy.ndarray) - A matrix that arranges p with desired shape.

gyoza.tutorials.plotting.make_color_wheel(pixels_per_inch: int, pixel_count: int = 128, swirl_strength: float = 0, gaussian_variance: float = 1) → ndarray[source]

Generates an image of a color wheel with swirl

Parameters:

dpi (int) – The density of pixels per inch on the user machine.
pixel_count (int, optional) – The desired width and height of image in pixels, defaults to 128
swirl_strength (float, optional) – The strength of swirl applied to the color wheel. Sensible values are in the range [0,10]. The sign is ignored. Defaults to 0
saturation (float, optional) – The saturation of the colors. valid values are in range [0,1], where 0 corresponds to a white image and 1 to a fully satured image. Defaults to 1.

Returns:

image (np.ndarray) - The image of shape [pixel_count, pixel_count, 4] where 4 are the channels.

gyoza.tutorials.plotting.make_radial_line(radius: float, rotation: float, point_count: int) → ndarray[source]

Generates a straight line with point_count many points that has one endpoint at the origin and the other endpoint on the circle defined by defined by radius and rotation.

Parameters:

radius (float) – The radius of the circle from which lines are generated.
rotation (float) – The angle of rotation of the line in radians. Movement is clockwise.
point_count (int) – The number of points on the line.

Returns:

x, y (np.ndarray) - The coordinates of line with shape [point_count, 2].

gyoza.tutorials.plotting.plot_contribution_per_layer(network: FlowModel, s_range: Tuple[float, float], manifold_function: Callable, manifold_name: str, layer_steps: List[int], step_titles: List[str])[source]

Plots for each layer (or rather step of consecutive layers) the contribution to the data transformation. The plot is strucutred into three rows. The first row shows a stacked bar chart whose bottom segment is the contribution due to affine transformation and the top segment is the contribution due to higher order transformation. To better understand the mechanisms behind these contributions there is a pictogram in the bottom row for the actual affine transformation and in the middle row for the remaining higher order part. This separation is done to understand the complexity of the transformation, whereby affine is considered simple and higher order is considered complex. The decomposition into affine and higher order is obtained by means of a first order Maclaurin series.

Parameters:

network (gyoza.modelling.flow_layers.FlowModel) – The network whose transfromation shall be visualized. It is expecetd to map 1 dimensional manifolds from the real 2-dimensional plane to the real 2-dimensional plane.
s_range (Tuple[float, float]) – The lower and upper bounds for the position along the manifold, respectively.
manifold_function (Callable) – A function that maps from position along manifold to coordinates on the manifold in the real two dimensional plane.
manifold_name (str) – The name of the manifold used for the figure title.
layer_steps (List[int]) – A list of steps across layers of the network. If, for instance, the network has 7 layers and visualization shall be done for after the 1., 3. and 7, then layer_steps shall be set to [1,3,7]. The minimum entry shall be 1, then maximum entry shall be the number of layers in network and all entries shall be strictly increasing.
step_titles (List[str]) – The titles associated with each step in layer_steps. Used as titles in the figure.

gyoza.tutorials.plotting.plot_input_output(network: FlowModel, S, manifold_function: Callable, noise_standard_deviation: float, manifold_name: str, zoom_output: bool = False)[source]

Plots the input and output to the network. Points are colored using a color wheel. Supplementary marginal distribution are provided.

Parameters:

network (mfl.SupervisedFactorNetwork) – The network that shall process the data. It is expected to map from [$M$,:math:N] to [$M$,:math:N], where $M$ is the instance count and $N=2$ the dimensionality.
S – A one-dimensional array providing the position along the manifold.
manifold_function (Callable) – A function that maps from position on manifold ($S$, shape == [instane count M]) to coordinates in $N=2$ dimensional space.
noise_standard_deviation (float) – Standard deviation of the noise that shall be added to the data before passing it through the model.
manifold_name (str) – The name of the manifold used in the title.
zoom_output (bool, optional, defaults to False) – Indicates whether the output should have the same zoom as the input (False) or should be zoomed according to its own scale (True).

gyoza.tutorials.plotting.plot_instance_pairs(S: ndarray, Z_a: ndarray, Z_b: ndarray, Y_ab: ndarray, manifold_function: Callable, manifold_name: str, pair_count: int = 3)[source]

Plots pairs of instances along with their similarities and the manifold (without noise).

Parameters:

S (np.array) – The position along the manifold. Shape == [$M$, 1], where $M$ is the number of instances.
Z_a (numpy.ndarray) – The coordinates of the a-instances to be plotted. Shape is assumed to be [$M$, $N$], where $M$ is the number of instances and $N = 2$ is the dimensionality of an instance.
Z_b (numpy.ndarray) – The same as Z_a, but for b-instances.
Y_ab (numpy.ndarray) – The similarities of the Z_ab instances. Shape is assumed to be [$M$, $F$], where $M$ is the number of instances and $F=2$ at axis 1 is the factor count.
manifold_function (Callable) – A function that takes as input the position S along the manifold and provides as output the two coordinates that are associated with that position along the manifold. Hence, [$M$,1] -> [$M$, $N$], where M is the number of instances and $N=2$ their dimensionality
manifold_name (str) – A name assigned to the manifold that is used as a label in the plot.
pair_count (int, optional, defaults to 3) – The number of pairs to be illustrated

gyoza.tutorials.plotting.plot_instance_pairs_2(Z_a: ndarray, Z_b: ndarray, title_suffix: str = '$Z$')[source]

Plots the instance pairs of Z_ab (or Z_tilde_ab) in two scatter plots. The first scatter plot shows the first dimension (index 0) of instance a and b while the second scatter plot shows the second dimension (index 1) of instances a and b. In the margins of each scatter plot, the marginal histograms are shown.

Parameters:

Z_a (numpy.ndarray) – The coordinates of the a-instances to be plotted. Shape is assumed to be [$M$, $N$], where $M$ is the number of instances and $N = 2$ is the dimensionality of an instance.
Z_b (numpy.ndarray) – The same as Z_a, but for b-instances.
title_suffix (str, optional, defaults to rf'$Z$') – The suffix to be added to the title, usually a string ‘Z’ to indicate that instances come from the Z-space or rf’$ ilde{Z}$’ to indicate that they come from the Z_tilde-space.

gyoza.tutorials.plotting.plot_inverse_point(position: float, residual: float, S: ndarray, network: FlowModel, manifold_function: Callable, manifold_name: str)[source]

This function visualizes the network’s inversion ability by plotting the inverse of the point[residual, position]. It also plots the manifold_function on input S for reference.

Parameters:

position (float) – The position along the manifold that shall be entered for dimensions at index 1 for inversion via the network.
residual (float) – The residual that shall be entered for dimensions at index 0 for inversion via the network.
S (numpy.ndarray) – Points along which the manifold_function shall be evaluated during plotting.
network (mfl.SupervisedFactorNetwork) – A network calibrated to disentangle manifold position (factor at dimension 1) from deviation from manifold (factor at dimension 0). It shall map from [$M$,:math:N] to [$M$,:math:N], where $M$ is the instance count and $N=2$ is the dimensionality.
manifold_function (_type_) – A function that maps from position on manifold ($S$, shape == [instane count M]) to coordinates in $N=2$ dimensional space.
manifold_name (str) – The name of the manifold used for the figure title.

gyoza.tutorials.plotting.plot_loss_trajectory(epoch_loss_means: List[float], epoch_loss_standard_deviations: List[float], manifold_name: str)[source]

Plots the loss trajectory after model calibration with error surface.

Parameters:

epoch_loss_means (List[float]) – The mean across batches for each epoch. Length = [epoch count]
epoch_loss_standard_deviations (List[float]) – The standard deviation across batches for each epoch. Length = [epoch count]
manifold_name (str) – The name of the manifold on which the model was calibrated. Used for the title.

gyoza.tutorials.plotting.swirl(x: ndarray, y: ndarray, x0: float = 0, y0: float = 0, radius: float = 5, rotation: float = 0, strength: float = 5) → Tuple[ndarray, ndarray][source]

Performs a swirl operation on given x and y coordinates.

Inputs: - x, y: Coordinates of points that shall be swirled. - x0, y0: The origin of the swirl. - radius: The extent of the swirl. Small values indicate local swirl, large values indicate global swirl. - rotation: Adds a rotation angle to the swirl. - strength: Indicates the strength of swirl.

Outputs: - x_new, y_new: The transformed coordinates.

gyoza.tutorials package

Submodules

gyoza.tutorials.data_synthesis module

gyoza.tutorials.mathematics module

gyoza.tutorials.modelling module

gyoza.tutorials.plotting module

Module contents