## Useful Perl

Ever want to have a simple script/program that:

1. takes a few arguments
2. reads in a text file
3. checks to see if the arguments are already in the text file
4. if they are add them, if not die?

If yes, then you might find the following little snippet of perl i hacked up for work useful:


my %values = ();
my @strs = ();

open DATA, "<test.txt";

while(<DATA>) {
chomp;
@parts = split ':', $_;$parts[0] =~ s/^\s+|\s+$//g;$parts[1] =~ s/^\s+|\s+$//g; if ($parts[0] eq $ARGV[0]) { die "Already in List\n"; }$values{ $parts[0] } =$parts[1];
}

$values{$ARGV[0]} = $ARGV[1]; close DATA; open DATA, ">test.txt"; for my$x ( sort keys %values ) {
printf DATA "%11s: %s\n",$x,$values{x}; } close DATA;  ## CUDA Tests Lately i've taken an interest in CUDA and it's associated programming interfaces. To test out just out powerful these video cards are for crunching large amounts of arithmetic, and thought the results needed to be shared. The test code looks like this: // includes, system #include <stdlib.h> // includes CUDA #include <cuda_runtime.h> #include <stdio.h> #include <assert.h> #include <cuda.h> // CUDA KERNEL __global__ void deviceKernel(unsigned long long *in_a, unsigned long long N) { unsigned long long idx = blockIdx.x*blockDim.x + threadIdx.x; if (idx<N) { unsigned long long x = in_a[idx]; unsigned long long smax = __double2ull_ru(__dsqrt_ru(__ull2double_ru(x))); int i; for (i=3;i<=smax;i+=2) { if ( x % i == 0 ) { in_a[idx] = 0; i=smax; } } } } // MAIN int main(int argc, char **argv) { unsigned long long *search_h, *result_h; // pointers to host memory unsigned long long *search_d; // pointer to device memory int search_max = atoll(argv[1]); // takes a cli argument int warpSize = 32; unsigned long long i, N ; N = ( search_max - 1 ) / 2; // search all odd numbers < search_max size_t size = N*sizeof(unsigned long long); search_h = (unsigned long long *)malloc(size); result_h = (unsigned long long *)malloc(size); cudaMalloc((void **) &search_d, size); // initialization of host data for (i=3; i<=search_max; i+=2) search_h[(i-1)/2 -1] = i; // setup search array // copy data from host to device cudaMemcpy(search_d, search_h, sizeof(unsigned long long)*N, cudaMemcpyHostToDevice); int blockSize = 8*warpSize; int nBlocks = N/blockSize + (N%blockSize == 0?0:1); deviceKernel <<< nBlocks, blockSize >>> (search_d, N); // fire off the kernel! cudaThreadSynchronize(); // wait for the kernel to finish cudaMemcpy(result_h, search_d, sizeof(unsigned long long)*N, cudaMemcpyDeviceToHost); free(search_h); free(result_h); cudaFree(search_d); return 1; }  Pretty straight forward. This code block copies a search list on the device, and block copies it back after all work is done. This is pretty obviously not the best way to do things, but for testing purposes, the copy operations are not the limiting factor. So, what we see here is O(NlogN) performance for prime finding. What's amazing to me is that this code takes only one second to find every prime number less than ten million! The graph shows that the runtime and copy operations are the limiting factor up to about one million prime numbers. From there, this naive implementation takes off, crunching an absolutely obscene amount of arithmetic, on "unsigned long long"'s no less! The source for the test is here: source . Sorry, i hacked up the graph manually. ## Spam So, those of you that have websites, and maybe some that do not, know all about this "SEO" B-S that sloppy devs and scammers use to try to generate traffic. Google's pagerank uses the number of links to a site to rate it's relevance and popularity. In essence, the algorithm periodically crawls the entire internet looking for links, counts the number that point to particular sites, and uses the number to sort all those sites. Thereby, if you have a site that, for example, sells Ugg boots or NFL Jerseys or Canadian prescriptions, you would want as many websites as possible to link to you so that your website is higher on the pagerank list. What people are doing nowadays is spamming the crap out of the comments on everyone's personal blogs (IE: mine) to try and have their comments create a link back to their website. This is crap. Today (April Fools), i have gotten hundreds of spam comments with zero content. I manual administer (allow/disallow) these, and its frankly a problem. I feel this level of trash and spam is in essence a kind of psychological DoS attack on my person. This is _my_ personal website, for my technical explorations and use, with my own full name across the top. I do not appreciate this automated crap filling my inboxes. As such, for those wishing to post legitimate comments, you will now need to fill out a captcha. I apologize. To everyone else: Piss off, no one here wants Ugg boots or cialis. ## LaTeX? If everything is working, then the Shrodinger equation is: Ill be curious to see if this has any effects or conflict with the PHP plugin ## C2000 Launchpad Assembly only Software PWM Tutorial So, i've completed an assembly project to build a software PWM program for the C2000 Launchpad. C2000 is actually a remarkably strong platform for assembly only program with some powerful addressing modes, many registers, and many powerful instructions. The platform to me reminisces of MSP430, which i had a wonderful time hacking at with the FOSS tool-chain and VIM. Being forced to use CCS and controlSUITE is a minor annoyance, (lets face it, it's the only practical way to develop for this platform) but its integration makes up for the majority of it's shortcomings. Has anyone tried to build Cortex assembly projects with GNU tools and OpenOCD? Thumb2 will make anyone human pull their hair out. Regardless of me being happy with the platform and environment, without further ado, here is the code: ## Main.asm ; main.asm ; By Daryl Metzler djmetzle@ncsu.edu ; ; MAIN ; .sect ".text" ; Program data .global _main _main: ; memory howto (for the Watchdog) ;memory: ; a simple demo of direct addressing ; MOVW DP, #variable ; So this is how to load data ; ;directly from memory ; MOVL XAR0, @variable ; The DP points to the page data is on, ;then MOV takes offset ; turn off watchdog timer wd_disable: ; We need to disable the WatchDog before beginning, ; or the DSP will reset after 512 cycles EALLOW ;Enable EALLOW protected register access MOVZ DP, #7029h>>6 ;Set data page for WDCR register MOV @7029h, #0068h ;Set WDDIS bit in WDCR to disable WD EDIS ;Disable protected register access ; interrupts intenable: ; Allow interrupts EALLOW CLRC VMAP ; map interrupts from boot rom to 0x0 EINT EDIS ; PLL pll_enable: ; Turn on the PLL EALLOW MOVZ DP, #7021>>6 ; PLLCR MOV @0x7029, #0xC ; Set PLLCR[DIV] = 12 MOVZ DP, #7011>>6 ; PLLSTS OR @0x7011, #0x100 ; Set PLLSYS[DIVSEL] = 2 EDIS ; CPU Timer 1 timer_enable: EALLOW OR IER, #0x1000 ; Enable INT13 in IER MOVZ DP, #0xC0C>>6 ; TIMER1TCR OR @0xC0C, #0x4000 ; Set Interrupt enable in TIMER1TCR MOVZ DP, #0xC0A>>6 ; TIMER1TCR MOV @0xC0A, #0x500 ; Set TIMER1PRD period register MOVZ DP, #0xC0B>>6 ; MOV @0xC0B, #0x0 ; TIMER1PRDH EDIS ; GPIO <blinkys> gpio_enable: EALLOW MOVZ DP, #6F86h>>6 ; GPA registers AND @6F86h, #0xFF00 ; Set GPIO1-4 for GPio MOVZ DP, #6FC2h>>6 ; GPASET OR @6FC2h, #0x000F ; Set GPIO1-4 MOVZ DP, #0x6F8A>>6 ; GPADIR MOV @6F8Ah, #0x000F ; Set direction MOVZ DP, #6FC0h>>6 ; GPADAT AND @0x6FC0, #0xFFF0; Clear GPIO1-4 (LEDS ON) OR @0x6FC0, #0xFFFA ; Set GPIO1-4 ; initialize sw_pwm MOV ACC, #0x0 MOVL XAR0, #0x0 MOVL XAR1, #0x40 MOVL XAR2, #0x80 MOVL XAR3, #0xC0 MOVL XAR6, #0x0F MOVL XAR7, #0 loop: MOVW AL, AR7 CMP AL, AR0 B onp0, LT OR AR6, #0x1 onp0: CMP AL, AR1 B onp1, LT OR AR6, #0x2 onp1: CMP AL, AR2 B onp2, LT OR AR6, #0x4 onp2: CMP AL, AR3 B onp3, LT OR AR6, #0x8 onp3: nop CMP AR7, #200h B swpwmwrap, GEQ B loop, UNC swpwmwrap: MOVL XAR7 , #0x0 MOV AR6, #0h ; Clear GPIO1-4 (LEDS ON) ; We're just incrementing the AR0-3 registers ; Which pulses their respective LED MOV AL, AR0 ADD AL, #1 CMP AL, #100h B modulus0, LT MOV AL, #0 modulus0: MOVW AR0, AL MOV AL, AR1 ADD AL, #1 CMP AL, #100h B modulus1, LT MOV AL, #0 modulus1: MOVW AR1, AL MOV AL, AR2 ADD AL, #1 CMP AL, #100h B modulus2, LT MOV AL, #0 modulus2: MOVW AR2, AL MOV AL, AR3 ADD AL, #1 CMP AL, #100h B modulus3, LT MOV AL, #0 modulus3: MOV AR3, AL B loop,UNC ; loop after fall off  We see here that main is using a lot of magic and most of the work done here is getting the hardware alive and configured. The output is that the LEDs pulse and blink using PWM. Really, there is no sophisticated output in terms of what is done with that PWM. The duty cycles increment modulo half the duty period. The C2000 Launchpad's LED's are wired in such a way that even at 25% duty cycle, it is difficult for the eye to pick out the "dim" LED from one at full intensity. ## Interrupts.asm  .sect ".text" .global int1irq int1irq: NOP IRET .global int2irq int2irq: NOP IRET .global timer1irq timer1irq: MOV AL, AR7 ADD AL, #0x1 MOV AR7, AL PUSH DP MOVZ DP, #6fC0h>>6 MOVW @6FC0h, AR6 ; Light up the LEDs by AR6 value POP DP IRET  The interrupt routine "timer1irq" does little more than increment the PWM counter kept in AR7, and move the current LED GPIO value onto the GPADAT register. Feel free to check out the directory to grab the code for yourself. I hope someone finds this helpful, and that this launchpad gains a little more attention and steam with the community. ## C2000 Launchpad on Fedora I have recently received my C2000 Launchpad (from Mouser) and after much headache, finally have things up and working. Since the process was a little bit of a hassle, I figured i would do a post as a kind of Tutorial for bootstrapping the development process in Fedora 17 (as you may know my Distro of Choice). Hopefully this will fill a niche to get Fedora users from an unopened box all the way to custom Blinky application. ## Firstly you should hit the following links: TI's Pages: TI's C2000 Launchpad page TI's Wiki Grab the following Documentation: C2000 Launchpad Product Brief The C2000 Launchpad User Guide And the following Software: Code Compose r Studio (get the current version for linux) controlSUITE At this point you should be up to speed on what you have. The Example that comes pre-programmed is pretty interesting. In order to get serial working (if you roll a custom linux kernel like me), make sure that you have the following USB driver built into your kernel: USB FTDI Single Port Serial Driver as well as the necessary composite device support. I wasn't getting /dev/ttyUSB0 at first. ## CCS Install Installing CCS is hopefully straight-forward for you, grab the gzip, extract it, and run the included binary. When installing, i put my copy in /opt/ti. For convenience, i will refer to all paths as if you installed yours at /opt/ti, so i hope there's no confusion there. During installation, choose a custom install, then make sure to include the C2000 libraries, the XDS100 emulator support as a minimum. This should save you some precious space on /. ## controlSUITE Install Next, install controlSUITE with wine into the same parent directory as CCS (again /opt/ti). controlSUITE isn't going to play properly on linux, which is okay as we only really need the headers and drivers, more on which later. ## First Impressions Begin by firing up CCS and selecting a License (i simply chose Evaluate). Next, you'll find yourself at the Welcome screen. Select Resource Explorer and at the bottom of the screen you should find "Configure Resource Explorer to discover examples, documentation and generates a resource package" Add the top level controlSUITE directory here (/opt/ti/controlSUITE). Now search your new package for launchpad to find Example_F2802xLaunchPadDemo, literally the only Launchpad specific example code in controlSUITE. ## The Example Now for some interesting stuff. Try building this project by clicking the little hammer "build" icon. In my beginnings, many hiccups and errors came up. First, setup the target configuration. Click File->New->Target Configuration File. Click through the default on the dialog box that pops up. In the next screen select XDS100v2 USB Emulator, and using a search for 28027 find the TMS320F2807 device. Save this config, and test the connection. Hopefully you get a successful test. If not, check lsusb, lsmod, and your kernel config for the right drivers, to ensure that everything is in line with hardware support. Next, lets fix a pesky bug. Browse in your project editor to: Includes->/opt/ti/***/C2000_Launchpad->F2802x_common->cmd->F28027.cmd In this file edit line 117 to change that last period into a comma, a bug i found out about in this post. ## Linker path Next, there is a problem with the include path for the linker. If an error comes up complaining about IQmath.lib, then add the include path manually. Right click the project in the explorer and select "Show Build Settings". Under Build->C2000 Linker->File Search Path remove the bare entry for IQmath.lib, and add a new entry with the full path: /opt/ti/controlSUITE/development_kits/C2000_LaunchPad/f2802x_common/lib/IQmath.lib Your build and debug should now succeed! # Now for some Custom Blinking Rather than reinvent the wheel, i forward you here. Pay special attention to the Update at the bottom in order to pull in the headers and driverlib.lib. Cheers! ## AVX Extensions ### Testing out code inclusion with the Syntax highlighter backend. AVX Extensions are a new type of mixed integer and floating point vector instructions that use 256 bit wide registers, similar to the vector processing capabilities of SSE, which used 128 bit registers.  .file "main.c" .data .align 32 intset1: .float 1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8 .bss .align 32 intans: .rept 8 .float 0.0 .endr .text .align 4 .globl main .type main, @function main: pushq %rbp movq %rsp, %rbp vzeroall avx: vmovdqa intset1, %ymm0 vmovdqa intset1, %ymm1 nop vmulps %ymm0, %ymm1, %ymm2 vmovdqa %ymm2, intans movl0, %eax
popq	%rbp
ret


Fixed this code up, so to whoever might have been following this, now im seeing the correct output when i run this in the debugger.

Some interesting stuff here, backwards compatible with SSE if your kernel doesnt support AVX. (tested on the web server)
We move the string of eight (or four) floats into the vector registers ymm0 and ymm1 (or xmm0 and xmm1 for SSE) and vector multiply them into ymm2 (or xmm2). The result gets saved into bss. Really, there's no way to "see" this code work without the debugger, but the power of doing eight floating point multiplies in one clock cycle is amazing.