RealMLP
An improved multilayer perceptron (MLP).
Functions
def select_from_config(config: Dict, keys: List)
Selects specific keys from a configuration dictionary.
Parameters:
config (Dict) - Configuration dictionary.
keys (List) - List of keys to select.
Returns:
Dict - Dictionary containing only the selected keys.
def adapt_config(config, **kwargs)
Adapts a configuration dictionary with new parameters.
Parameters:
config (Dict) - Original configuration dictionary.
kwargs - New parameters to add or override.
Returns:
Dict - Modified configuration dictionary.
def serialize(filename: Union[Path, str], obj: Any, compressed: bool = False, use_json: bool = False, use_yaml: bool = False, use_msgpack: bool = False)
Serializes an object to a file using various formats.
Parameters:
filename (Union[Path, str]) - Output file path.
obj (Any) - Object to serialize.
compressed (bool, optional, Default is False) - Whether to compress the file.
use_json (bool, optional, Default is False) - Whether to use JSON format.
use_yaml (bool, optional, Default is False) - Whether to use YAML format.
use_msgpack (bool, optional, Default is False) - Whether to use MessagePack format.
def deserialize(filename: Union[Path, str], compressed: bool = False, use_json: bool = False, use_yaml: bool = False, use_msgpack: bool = False)
Deserializes an object from a file.
Parameters:
filename (Union[Path, str]) - Input file path.
compressed (bool, optional, Default is False) - Whether the file is compressed.
use_json (bool, optional, Default is False) - Whether to use JSON format.
use_yaml (bool, optional, Default is False) - Whether to use YAML format.
use_msgpack (bool, optional, Default is False) - Whether to use MessagePack format.
Returns:
Any - Deserialized object.
class Timer
Timer class for measuring execution time.
Methods:
start(self) - Start the timer.
pause(self) - Pause the timer.
get_result_dict(self) - Get timing results as dictionary.
class TimePrinter
Context manager for printing execution time.
Parameters:
desc (str) - Description for the timing operation.
Usage:
```python with TimePrinter(“Operation”):
# code to time
class TabrQuantileTransformer(BaseEstimator, TransformerMixin)
Quantile transformer with noise addition for tabular data.
Parameters:
noise (float, optional, Default is 1e-3) - Noise level to add.
random_state (int, optional) - Random seed.
n_quantiles (int, optional, Default is 1000) - Number of quantiles.
subsample (int, optional, Default is 1_000_000_000) - Subsample size.
output_distribution (str, optional, Default is “normal”) - Output distribution type.
Methods:
fit(self, X, y=None) - Fit the transformer.
transform(self, X, y=None) - Transform the data.
_add_noise(self, X) - Add noise to the data.
class ProcessPoolMapper
Process pool mapper for parallel processing.
Parameters:
n_processes (int) - Number of processes.
chunksize (int, optional, Default is 1) - Chunk size for mapping.
Methods:
map(self, f, args_tuples: List[Tuple]) - Map function over arguments in parallel.
def extract_params(config: Dict[str, Any], param_configs: List[Union[Tuple[str, Optional[Union[str, List[str]]]], Tuple[str, Optional[Union[str, List[str]]], Any]]]) -> Dict[str, Any]
Extracts parameters from configuration based on parameter configurations.
Parameters:
config (Dict[str, Any]) - Configuration dictionary.
param_configs (List) - List of parameter configurations.
Returns:
Dict[str, Any] - Extracted parameters.
def combine_seeds(seed_1: int, seed_2: int) -> int
Combines two seeds into a single seed.
Parameters:
seed_1 (int) - First seed.
seed_2 (int) - Second seed.
Returns:
int - Combined seed.
References:
David Holzmüller and Léo Grinsztajn and Ingo Steinwart. Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data. arXiv:2407.04491 [cs.LG], 2025. https://arxiv.org/abs/2407.04491