Blobby Land Gordon J. Pace Chris Porter Christian Colombo March 2010 Who killed Mr Blobby? Mathematics and (partly) Computer Science are really all about one question: What can we say about objects which can be described in a finite manner? Compression is usually done by trying to find regularities in the source: Finding concise rules which generate part of the data bigger than the rule. Find common subparts and describe them once Mr Blobby is innocent Applications of compression are numerous: Smaller data for transmission and storage e.g. fit a whole movie at high resolution on a single DVD. Less resources if the description we are compressing is of a real life process: e.g. process chain optimisation. Allow you to win the 2010 ICTSA Programming Challenge Mr Blobby woz ’ere Given: an image with a 3-bit depth (8 colours) a number n To produce: not more than n blobs of paint approximating the original image in less than one minute. using as little as possible paint From Mr Ducky… … to Mr Blobby … to Mr Blobby In 16 blobs … to Mr Blobby … and no, I did not produce this automatically Some Interesting Notes The problem of finding an optimal solution is hard: There’s an infinite search space (allowing fractions, and blobs with centres outside the grid) But could be reduced to a finite one if we ignore the less paint is better constraint. Someone asked “is it NP-complete?” It is in NP (so certainly not worse than NPC), and seems to be in NPC, but am not sure. A brute force approach is out of the question Here’s an evaluation image (given with 5, 50, 100, 1000 blobs) Here’s what one team produced (1000 blobs) and another team (1000 blobs) and yet another (1000 blobs?!) The Teams The Teams: Industry-Based 1. 2. 3. 4. 5. 6. 7. CasaSoft: Karl Cassar, Mark Cassar Exigy Ltd: Christian Tabone, Clint Spiteri, Bernard Cachia, Jean-Paul Sultana Ixaris: JP Ebejer, Jimmy Borg, Brian Vella, Brian Attard Bason Gigi tal-MITA: Daniel D'Agostino CCBill EU: Emerson Farrugia, Christopher Torpuland, Briun, Wallace Wadge, Edward Mallia Holistic: Ruben Gatt, Matthew Falzon, Shaun Camilleri, Luke Rocco Atlas Insurance: Karl Fenech The Student Teams 8. 9. 10. 11. 12. 13. 14. 15. 16. Iš Vilniaus: Michael Debono, Andrew Borg Cardona papa charlie: Ian Agius, James Scicluna WM\{James}: Ingram Bondin, Kevin Falzon, Andrew Gauci daemon: Kurt Farrugia, Stefan Lia AJaZ: Andrew J Said, Julian Zammit C^D: Claire Dimech, Daniela Cauchi y.a.t.n.: Darren Demicoli TDGS: Kyle Pullicino, Adrian Duca, Kurt Portelli, Paul Felice B3ANi3S: David Borg, Duncan Borg, Thorsten Callus The Teams and their Solutions 01: CasaSoft Implemented using C# (.Net Framework 3.5) Based on a genetic algorithm. Reduced search space by specifying a boundary in which blobs can be placed and using bounded accuracy. Each generation of the genetic algorithm results in the best found blob according to the population which gives the best score. The genetic algorithm continues if there is still time available. The population size and the number of epochs per generation are calculated as a function of the bitmap size. An optimization to automatically calculate the largest square that can fit in the circle, enabling easy flagging of pixels to be painted. The fitness function does not to check for similar pixels, so as not to slow the speed of the algorithm. An Evaluation Problem (50, 100, 1000 blobs) And their solution (50 blobs) 02: Holistic Implementation language C# Used a basic Genetic Algorithm to approximate the result. No fine tuning and optimizations were done on this algorithm. Alas, all problems timed out 03: Ixaris Approach: Based on genetic algorithms (where genes are blobs) Initial population based on each colour layer being considered as one blob (and 25 random mutants generated on this) Genetic Operators – split (gives us two blobs), swap blobs, forceful remove, remove (if the blob has little effect), tweak radius, tweak centre position. Changing probability distribution for the above Multi-Threaded – reproduction of chromosomes done in parallel because of inherent independence from each other Issues Loads of magic variables (e.g. children count, max fit population, noise filter threshold, initial population, convergence limit, probability distributions for genetic operators) Interesting note: how to find good approximations for these (in a few hours)? (Neural Networks, Approximate Bayesian Computing for parameter estimation seem too onerous) As it very so often happens, last minute improvement fixes actually broke the solution (image reflected along x-axis – damn cartesian coordinate system) Don’t laugh. (Fixed at 8:20pm – too late). An Evaluation Problem (5, 50, 100 blobs) And their solution (50 blobs) 04: Gigi tal-MITA Implementation language: C Using a floodfill approach, identify connected areas of the same colour. Count the pixels in each area, order them in descending order of number of pixels, and calculate the centroid of each group. Fit circles into the image: Start at the centroid with a circle of maximum radius for that area. Decrease the radius until it has a pretty damn high level of inclusion (defined by a threshold variable) Inclusion is good if it includes that particular colour or even any other - as long as it's not white (background colour) Blob the circle with the best fit Repeat these steps, renumbering pixels which still have not been blobbed. On Sunday morning I realised I did not have enough time to complete the algorithm as I would have wanted to. Realising that submitting an average solution is better than submitting nothing at all, I went for a compromise - I simply used a circle of average radius. Although this leaves room for a good deal if inaccurate pixels especially if the area is noncircular, overall it has given me an accuracy of 50-69%, which I guess is reasonable for an approximate solution. An Evaluation Problem (5, 50, 100 blobs) Their Solution (100 blobs) 05: Pr0nsters Inc. CCBill EU Our solution has the following main stages: Start by detecting colour edges. Next, using the edges, generate a Voronoi tessellation diagram to create a number of polygons giving the locations where cells meet. These are transformed into circles. Clustering/filtering mechanism: Edge detection Voronoi generation Clustering/filtering Take the most frequent colour and use that as our base blob to cover the image. Order the circles by radius Calculate circle overlaps - anything more than a fixed percentage gets dropped (changing the threshold until the number of blobs is reduced as required). Stuff we didn’t manage to complete in time: A better clustering mechanism by use of the DBScan algorithm to detect “noise” circles. This was implemented but was too late to be merged in the main program An Evaluation Problem (5, 50, 100, 1000 blobs) Their Solution (1000 blobs) 07: Atlas Insurance Technologies: C#, .NET Framework 3.5, multithreading. Heuristic driven using image resampling: From the smallest downsample containing as-yet-unmatched pixels, randomly pick one unmatched pixel, and use it as the bounding square for a blob which would set it to the correct color. Repeat the following until the largest upsample has been processed: Bitmap is downsampled, by a factor of 2, to a minimum resample area of 1×1. And also upsampled, by a factor of 3, to a maximum area of 2048×2048. Compute the respective similarity gains obtained from transforming the bounding square by 1 pixel (only) in all directions, and pick the best transformation as the improved bounding square. Multiply the bounding square’s dimensions and bottom-left coordinate by the stepwise resample factor (2 or 3) so as to translate it onto the corresponding pixels in the next (larger) sample. Add the blob and go to the next unmatched pixel… Downsamples guarantee convergence of optimal bounding square for next blob in O (log2 n) time. Upsamples help by achieving sub-pixel precision on the original bitmap. An Evaluation Problem (5, 50, 100, 1000 blobs) Their Solution (1000 blobs) 08: Iš Vilniaus Implementation Language: C# Heuristics and Algorithms A modified maximum number of blobs is used in generation. For performance, a simpler similarity function to the original bitmap was employed that only considered the number of absolute correct pixels. Method 1: To generate timer-safe blobs, the center of the rectangular area defined by each component is used as the centre of a blob. Method 2: A Genetic algorithm Images larger than 100x100 pixels area were resized down to fit an area of 100x100. The image is split into single colour components using a floodfill algorithm. Components are considered largest first. Chromosomes were configurations of a random number of components up to the limit, and the number of splits for each component. Each split of the components became a blob and the Center of Gravity calculated from the area of the split sub-component. The radius was calculated from the width and height of the split and also from the difference to the center of gravity. The similarity function is used as a fitness function. Termination upon timeout or convergance. The best solution is used Evaluation Problem (5, 20, 100 blobs) Their Solution (5 blobs) 09: Papa Charlie Language of choice: Java The algorithm: Divide the bitmap into components, each holding a colour area; The similarity calculator was used to find the best result on a component; Until the maximum number of blobs is reached, repeat the following procedure: Remove components with a small number of correct pixels Subdivide components with the most wrong pixels Heuristics to cater for special cases were also created. Another Evaluation Problem (5, 50, 100 blobs) Their solution with 100 blobs 10: WM\{James} A suite of 3 algorithms, run separately, and the best result chosen. Colour Based Overapproximation: For each colour (starting with the most frequent one), the centroid of the extremal points of the points where the colour appears is calculated, and a blob drawn covering the whole area. Very fast but crude algorithm. Cluster Based Overapproximation: Contiguous clusters of colours are ordered by size, largest first. The algorithm then assigns blobs of colour to each cluster until the blobs are exhausted, using the centroid approach. Increased cost to generate the clusters. Cluster Based Overapproximation and Merging: Build upon previous by also merging clusters which have not been assigned a blob to an existing blob. Result is also checked with a different blob ordering (biggest first). There are also minor trigonometric optimisations. Another Evaluation Problem (with 5, 20, 100 blobs) Their solution with 20 blobs 11: daemon “We used java to implement our blobify. The approach we took was a genetic algorithm.” All the problems timed out 12: AJaZ Tackled the problem in C# on .NET Framework 3.5 If there are enough blobs, put a blob of radius 0.0001 (we liked the way such a blob can cover a pixel!) at the centre of each pixel. If not, use a Genetic Algorithm Large images (>2000 pixels), are scaled down to around 1000 pixels. Places blobs randomly (with no colour) Colour is calculated to be the most common color of the visible part of the circle Mutation: Blobs with low amount of correct pixels are removed Blobs that overlap on a blob of the same color are removed Other Mutations: Change position, Change Radius, Change Blob order chances for each one are variable and change depending on which produces the most "better" mutations. (Sort of tailor making the mutation for every different bitmap) The last 5 seconds spend mutating the best solution Stopping condition: 57 seconds (excluding input and output) Guess the original image (1000 blobs) Che Guevara (5, 50, 100, 1000 blobs) 13: C^D The blobmap is divided into monochrome blobmaps. With each monochrome blobmap: Extract a rectangle from this map such that all colored pixels are within it. Create a blob with radius half the smallest side of the rectangle. If half the pixels in the rectangle are coloured then use the blob Otherwise, split the monochrome map and repeat. Different splitting mechanisms were used. The blobs used are the biggest ones generated. The smiley faces ended up as… this using 100 blobs 14: y.a.t.n. Implementation language: C Randomly place blobs in areas of the image which use the same colour. The best blob (in terms of correct pixels and paint used) is kept. The remaining blobs are randomly refined by adding blobs in positions which enclose the same colour of the blob being put and again assessing the similarity. Repeat until the algorithm converges or until 50 seconds have elapsed. Guess who this is (1000 blobs) 5, 50, 100, 1000 blobs 15: TDGS Language used: Java The size of the input bitmap is increased to a square with sides of length a power of two The bitmap is then split into multiple grids of various sizes to determine the occurrence of colours at various areas of the image. In those areas where the occurrence of a colour is higher than a certain threshold, a blob is added to cover that area. The grids in an outward spiral fashion (“just to be cool”) Another Evaluation Problem (5, 50, 100, 1000 blobs) 1000 blobs 16: B3ANi3S Tried to emulate the way a human would try to solve the problem: Recognize the basic shapes in the picture by reading the bitmap line by line, storing the coordinates of the start and end points of each line having the same colour, and eliminating errors through sampling. From the coordinates collected earlier, calculate the optimal blob position and size which will fill all the shape with least overflow, and if the total number of blobs is exceeded, remove the smallest blobs. Order the blobs by checking the perimeter of the shape in the original image and seeing which colour is best on top. No working solution submitted Some other impressive solutions More impressive solutions Some other impressive solutions Some other impressive solutions Some other impressive solutions Some other impressive solutions Some other impressive solutions Some other impressive solutions But who won? Student Teams # Team Name Score Student Teams # Team Name 09 papa charlie Score 720 Student Teams # Team Name Score 08 Iš Vilniaus 765 09 papa charlie 720 Student Teams # Team Name Score 10 WM\{James} 08 Iš Vilniaus 765 09 papa charlie 720 1075 Student Teams # Team Name Score 15 TDGS 1790 10 WM\{James} 1075 08 Iš Vilniaus 765 09 papa charlie 720 Student Teams # Team Name Score 12 AJaZ 2685 15 TDGS 1790 10 WM\{James} 1075 08 Iš Vilniaus 765 09 papa charlie 720 Industry Teams # Team Name Score Industry Teams # Team Name 04 Gigi tal-MITA Score 455 Industry Teams # Team Name Score 03 Ixaris 465 04 Gigi tal-MITA 455 Industry Teams # Team Name Score 05 CCBill EU 03 Ixaris 465 04 Gigi tal-MITA 455 1595 Industry Teams # Team Name Score 01 CasaSoft 1710 05 CCBill EU 1595 03 Ixaris 465 04 Gigi tal-MITA 455 Industry Teams # Team Name Score 07 Atlas Insurance 3380 01 CasaSoft 1710 05 CCBill EU 1595 03 Ixaris 465 04 Gigi tal-MITA 455 Merging the Results # Team Name Score 07 Atlas Insurance 3320 12 AJaZ 1260 01 CasaSoft 840 05 CCBill EU 830 15 TDGS 690 Prosit tal-Programmi! Hope to see you again next year… To check the evaluation problems were unchanged: Take the encrypted file Uncompress using the following password: 1. 2. Il-materjal li inbeżaq mill-impatti ikkrejaw krateri oħra sekondarji. Xi krateri għandhom kisja ta' l-eġekta (materjal) li jifraq għal mijiet ta' kilometri mill-punt tal-impatt. Rename to .zip Uncompress using the following password: 3. 4. Il-Qamar huwa l-unika satellita naturali tal-Art. Jorbita il-pjaneta tagħna minn 350,000 għal 400,000 km, jakkonpanja l-Art fittidwir madwar l-istilla Sol (Xemx).Ġieli jissejjaħ Luna (millKelma Latina għal qamar), jew Selene. Thanks to…

Descargar
# Document